Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling

The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically

PDF / 1,283,855 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
89 Downloads / 235 Views

DOWNLOAD

REPORT

Scytl Secure Electronic Voting, Barcelona, Spain {JuanIgnacio.Toledo,Jordi.Cucurull}@scytl.com 2 Computer Vision Center, Universitat Aut` onoma de Barcelona, Barcelona, Spain {afornes,josep}@cvc.uab.es 3 Department of Computer Science, TU Dortmund University, Dortmund, Germany {sebastian.sudholt,gernot.fink}@tu-dortmund.de

Abstract. The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically meaningful entities. In this paper, we describe a new method to obtain word categories directly from non-preprocessed handwritten word images. The method can be used to directly extract information, being an alternative to the transcription. Thus it can be used as a ﬁrst step in any kind of syntactical analysis. The approach is based on Convolutional Neural Networks with a Spatial Pyramid Pooling layer to deal with the diﬀerent shapes of the input images. We performed the experiments on a historical marriage record dataset, obtaining promising results. Keywords: Document image analysis · Word image categorization Convolutional neural networks · Named entity detection

1

·

Introduction

Document Image Analysis and Recognition (DIAR) is the pattern recognition research ﬁeld devoted to the analysis, recognition and understanding of images of documents. Within this ﬁeld, one of the most challenging tasks is handwriting recognition [3,6], deﬁned as the task of converting the text contained in a document image into a machine readable format. Indeed, after decades of research, this task is still considered an open problem, specially when dealing with historical manuscripts. The main diﬃculties are: paper degradation, diﬀerences in the handwriting style across centuries, and old vocabulary and syntax. Generally speaking, handwriting recognition relies on the combination of two models, the optical model and the linguistic model. The former is able to recognize the visual shape of characters or graphemes, and the second interprets c Springer International Publishing AG 2016 A. Robles-Kelly et al. (Eds.): S+SSPR 2016, LNCS 10029, pp. 543–552, 2016. DOI: 10.1007/978-3-319-49055-7 48

544

J.I. Toledo et al.

them in their context based on some structural rules. The linguistic model can range from simple n-grams (probabilities of character or word sequences), to sophisticated syntactic formalisms enriched with semantic information. In this paper we focus in this last concept. Our proposed hipothesis is that in certain conditions where the text can be roughly described by a grammatical structure, the identiﬁcation of named entities can boost the recognition in a parsing process. Named entity recognition is an information extraction problem consisting in detecting and classifying the text terms into pre-deﬁned categories such as the names of people, streets, organizations, dates, etc. It can also be seen as the semantic annotation of text element

Data Loading...

Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling

Recommend Documents

Recognizing handwritten digits with convolutional neural networks

Convolutional neural network with spatial pyramid pooling for hand gesture recognition

Does Removing Pooling Layers from Convolutional Neural Networks Improve Results?

Spatial Graph Convolutional Networks

Deep Learning and Convolutional Neural Networks for Medical Image Computing

Convolutional Neural Networks and Periocular Region Image Recognition

A Sparse Pyramid Pooling Strategy

Convolutional Neural Networks

Traffic Sign Detection with Convolutional Neural Networks

Object Detection with Convolutional Neural Networks

Aphids Detection on Lemons Leaf Image Using Convolutional Neural Networks

Natural Image Matting Using Deep Convolutional Neural Networks