Offline Handwritten Devanagari Word Recognition: An HMM Based Approach

A hidden Markov model (HMM) for recognition of handwritten Devanagari words is proposed. The HMM has the property that its states are not defined a priori, but are determined automatically based on a database of handwritten word images. A handwritten word

  • PDF / 366,828 Bytes
  • 8 Pages / 430 x 660 pts Page_size
  • 85 Downloads / 207 Views

DOWNLOAD

REPORT


Abstract. A hidden Markov model (HMM) for recognition of handwritten Devanagari words is proposed. The HMM has the property that its states are not defined a priori, but are determined automatically based on a database of handwritten word images. A handwritten word is assumed to be a string of several stroke primitives. These are in fact the states of the proposed HMM and are found using certain mixture distributions. One HMM is constructed for each word. To classify an unknown word image, its class conditional probability for each HMM is computed. The classification scheme has been tested on a small handwritten Devanagari word database developed recently. The classification accuracy is 87.71% and 82.89% for training and test sets respectively. Keywords: Hidden Markov Model(HMM), Devanagari Word Recognition, Stroke Primitives.

1

Introduction

Handwriting recognition is one of the challenging problems in Pattern Recognition. The problem has been studied for several decades and many reports on handwriting recognition in the scripts of developed nations are available in the literature. However, only a few works on handwriting recognition in Indian scripts have been reported ([1]-[3]). The present paper deals with recognition of offline handwritten Devanagari words. Works on recognition of handwritten Devanagari characters/numerals exist ([4],[5]). However, no work on handwritten Devanagari word recognition has been reported. According to literature review there are two approaches for handwritten word recognition: local or analytical approach held at the character level [6] and global approach held at the word level [7]. The first approach deals with the segmentation problem i.e., the words are first segmented into characters or pseudocharacters, then the character model is used for recognition. Since word segmentation is itself a challenging problem, the success of recognition module depends much on segmentation performance. The second approach treats the word itself as a single entity and it goes for recognition without doing segmentation explicitly. However this approach is restricted to applications with small lexicon. A. Ghosh, R.K. De, and S.K. Pal (Eds.): PReMI 2007, LNCS 4815, pp. 528–535, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Offline Handwritten Devanagari Word Recognition

529

In our present work for word recognition we have applied the second approach because of two reasons: (a) to avoid the overhead of segmentation and (b) due to lack of standard benchmark database for training the classifier. Since a standard benchmark database was not availabe for Indian script so we created a word database for Devanagari to test the performance of our system. In the present report, training and test results of the proposed approach are presented on the basis of this database. We have used a hidden Markov model (HMM) in the proposed scheme for recognition of handwritten Devanagari words. An HMM is capable of making use of both the statistical and structural information present in handwritten images. This is why HMMs h