Script Identification Based on HSV Features

Many similar shaped scripts are used all over the world today. Scripts identification with similar shaped characters is one of the difficulties in script identification field and it need to be resolved. However, there are a little report about identificat

PDF / 1,640,225 Bytes
10 Pages / 439.37 x 666.14 pts Page_size
105 Downloads / 357 Views

DOWNLOAD

REPORT

School of Information Science and Engineering, Xinjiang University, Urumqi, 830046, Xinjiang, China [email protected] Network and Information Center, Xinjiang University, Urumqi, 830046, Xinjiang, China

Abstract. Many similar shaped scripts are used all over the world today. Scripts identiﬁcation with similar shaped characters is one of the diﬃculties in script identiﬁcation ﬁeld and it need to be resolved. However, there are a little report about identiﬁcation of Central Asian countries and Chinese Minority scripts, which identiﬁcation of similar scripts. In this paper, a multi-script database was established, which are including 2200 plain document images with diﬀerent reso‐ lution in 11 scripts such as English, Chinese, Arabic, Russian, Uyghur, Mongol, Tibet, Turkish, Kyrgyzstani, Uzbekistani and Tajikistani. Then, HSV features were extracted from each whole page image and they were classiﬁed by using BP neural network classiﬁer. After experiment in our system, it is achieved 88.14 % of average identiﬁcation rate and 99.0 % of highest identiﬁcation rate in our experiment with the dataset. Experimental results indicated that HSV features were eﬀective feature for identify these scripts. Keywords: Script identiﬁcation · HSV features · BP neural network

1

Introduction

Script identification, identify different languages, is text category identification [1–3]. This is because using the same text or ethnic regions may speak different kinds of languages. In recent years, automatic script identification as the front part of the work of the OCR is becoming more popular. Along with the development of computer technology, information processing of minority is gradually becoming necessary work. In this study, text documents of diﬀerent scripts were turned into as an image, and then the image is processed by digital image process technology. Since our aim is to identify the text document image classiﬁcation from diﬀerent scripts, however, the script identiﬁcation research can be solved by considered being a typical pattern recognition problem. That being the case, any script identiﬁcation system has the same structure as pattern recognition system. Script identiﬁcation technology generally consists of several stages such as document image acquisition, image pre-processing, feature extraction and classiﬁcation, in which these contents and methods of feature extraction is particularly important. © Springer Nature Singapore Pte Ltd. 2016 T. Tan et al. (Eds.): CCPR 2016, Part II, CCIS 663, pp. 588–597, 2016. DOI: 10.1007/978-981-10-3005-5_48

Script Identiﬁcation Based on HSV Features

589

The earliest identiﬁcation of scripts was in English and Latin [4], and then gradually oriented identiﬁcation of the East Asian Languages and Latin scripts. Spitz [5] developed an approach for classifying Han-based that it is included Chinese, Japanese, Korean and Latin-based scripts. In this method, Han based script is performed by analysis of the distribution of optical density in the text images and Latin-based languages used a tech‐

Data Loading...

Script Identification Based on HSV Features

Recommend Documents

Features Identification and Selection

An efficient image retrieval based on an integration of HSV, RLBP, and CENTRIST features using ensemble classifier learn

Script(s)

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features

HSV-TK

HSV-1

HSV-2

Person Re-identification Based on Fusing Appearance Features in Perceptual Color Space

Human Identification Based on Gait

Dynamical System Identification of Complex Nonlinear System Based on Phase Space Topological Features

People Identification Based on Soft Biometrics Features Obtained from 2D Poses

A Script Knowledge Based Dialogue System for Indoor Navigation