HMM Parameter Estimation with Genetic Algorithm for Handwritten Word Recognition

This paper presents a recognition system for isolated handwritten Bangla words, with a fixed lexicon, using a Hidden Markov Model (HMM). A stochastic search method, namely, Genetic Algorithm (GA) is used to train the HMM. The HMM is a left-right HMM. For

  • PDF / 443,141 Bytes
  • 9 Pages / 430 x 660 pts Page_size
  • 55 Downloads / 208 Views

DOWNLOAD

REPORT


2

4

IBM India Pvt Ltd, BCS Building, Salt Lake, Kolkata - 700091, India [email protected] Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata-700108, India [email protected] 3 Cognizant Technology Solutions, Salt Lake, Kolkata - 700091, India [email protected] Dept. of Computer and System Sciences, Visva-Bharati, Santiniketan, India [email protected]

Abstract. This paper presents a recognition system for isolated handwritten Bangla words, with a fixed lexicon, using a Hidden Markov Model (HMM). A stochastic search method, namely, Genetic Algorithm (GA) is used to train the HMM. The HMM is a left-right HMM. For feature extraction, the image boundary is traced both in the anticlockwise and clockwise directions and the significant changes in direction along the boundary are noted. Certain features defined on the basis of these changes are used in the proposed model.

1

Introduction

Off-line handwritten word recognition is the transcription of handwritten data into a symbolic (ASCII) electronic format. It has several applications such as reading addresses on mail pieces [1], reading amounts on bank checks [2], extracting census data on forms, reading address blocks on tax forms etc. There are two major approaches to recognition of a handwritten word image: analytical approach [3] and holistic approach [4]. The idea of analytical approach is to recognize the input word image as a series of segmented sub-images, called primitives. Holistic approach, on the other hand, considers the word image as a single, indivisible entity, and attempts to recognize the word from its over all shape. The present work deals with handwritten word recognition in Bangla with a holistic approach. To the best of our knowledge, no such work is reported on Bangla handwritten word recognition. We consider a set of 117 town names in West Bengal. For feature extraction, the contour of a word image is traced both in clockwise and anticlockwise directions and the points with directional changes are observed. The sequence of such change points represents a basic shape of the word image. Such change points are encoded into certain change codes. These change codes along with their position define the feature vector. A holistic approach based on discrete hidden Markov model (HMM) is used A. Ghosh, R.K. De, and S.K. Pal (Eds.): PReMI 2007, LNCS 4815, pp. 536–544, 2007. c Springer-Verlag Berlin Heidelberg 2007 

HMM Parameter Estimation with Genetic Algorithm

537

as the recognition engine here. One HMM is constructed for each word class. The Baum-Welch re-estimation method has traditionally been the first choice for training such an HMM. Yet a problem of over fitting on training samples may arise with this method. To resolve this problem to some extent, genetic algorithm (GA) has been used for optimizing the parameters of HMM.

2

HMM for Word Recognition

An HMM consists of three sets of parameters π = {πi }, A = {aij } and B = {bjk }, 1 ≤ i, j ≤ N , 1 ≤ k ≤ M , where π is the initial state probability distri