Segmentation-Driven Offline Handwritten Chinese and Arabic Script Recognition

The market of handwriting recognition applications is increasing rapidly due to continuous advancement in OCR technology. This paper summarizes our recent efforts on offline handwritten Chinese script recognition using a segmentation-driven approach. We a

  • PDF / 603,218 Bytes
  • 22 Pages / 430 x 660 pts Page_size
  • 22 Downloads / 193 Views

DOWNLOAD

REPORT


Abstract. The market of handwriting recognition applications is increasing rapidly due to continuous advancement in OCR technology. This paper summarizes our recent efforts on offline handwritten Chinese script recognition using a segmentation-driven approach. We address two essential problems, namely isolated character recognition and establishment of the probabilistic segmentation model. To improve the isolated character recognition accuracy, we propose a heteroscedastic linear discriminant analysis algorithm to extract more discrimination information from original character features, and implement a minimum classification error learning scheme to optimize classifier parameters. In the segmentation stage, information from three different sources, namely geometric layout, character recognition confidence, and semantic model are integrated into a probabilistic framework to give the best script interpretation. Experimental results on postal address and bank check recognition have demonstrated the effectiveness of our proposed algorithms: A more than 80% correct recognition rate is achieved on 1,000 handwritten Chinese address items, and the recognition reliability of bank checks is largely improved after combining courtesy amount recognition result with legal amount recognition result. Some preliminary research work on Arabic script recognition is also shown.

1 Introduction Research on handwritten script recognition has received increasing attention in recent years, since it meets with the demands from a wide range of commercial applications, such as automatic postal address reading, bank check processing, recognition of handwritten contents in forms, etc. Different ways exist to categorize handwritten script recognition. Depending on how the handwriting is acquired and converted to digital form, the research field can be distinguished as online and offline script recognition. For online, dynamic time information captured from the writing device increases the recognition accuracy, while for offline scripts, such information is unavailable and the recognition accuracy is usually much lower. According to the language, the scripts to be recognized can be specified as Roman, Asian, Arabic, etc, which can operate differently in recognition strategies, according to respective characteristic. D.S. Doermann and S. Jaeger (Eds.): SACH 2006, LNCS 4768, pp. 196–217, 2008. © Springer-Verlag Berlin Heidelberg 2008

Segmentation-Driven Offline Handwritten Chinese and Arabic Script Recognition

197

In this paper, we focus our attentions on the problem of offline handwritten Chinese script recognition. Compared with Roman script recognition [1][2][3][4], there has been relatively less research work done in this area. Most published papers concern the segmentation problems [5][6][7][8][9][10], post-processing [11], or specific applications [12][13][14][15][16]. Chinese handwritten script recognition presents a challenging problem for the following reasons: 1) There exists great variety in the styles of handwritten scripts. 2) Acc