Graph-Based Keyword Spotting in Historical Handwritten Documents
The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means o
- PDF / 418,061 Bytes
- 10 Pages / 439.37 x 666.142 pts Page_size
- 5 Downloads / 185 Views
Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Riggenbachstr. 16, 4600 Olten, Switzerland {michael.stauffer,kaspar.riesen}@fhnw.ch 2 University of Fribourg and HES-SO, 1700 Fribourg, Switzerland [email protected] 3 Department of Informatics, University of Pretoria, Pretoria, South Africa
Abstract. The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means of a novel method for keyword spotting in ancient handwritten documents. The proposed system relies on a keypoint-based graph representation for individual words. Keypoints are characteristic points in a word image that are represented by nodes, while edges are employed to represent strokes between two keypoints. The basic task of keyword spotting is then conducted by a recent approximation algorithm for graph edit distance. The novel framework for graph-based keyword spotting is tested on the George Washington dataset on which a state-of-the-art reference system is clearly outperformed. Keywords: Handwritten keyword spotting · Bipartite graph matching · Graph representation for words
1
Introduction
Keyword Spotting (KWS) is the task of retrieving any instance of a given query word in speech recordings or text images [1–3]. Textual KWS can be roughly divided into online and offline KWS. For online KWS temporal information of the handwriting is available recorded by an electronic input device such as, for instance, a digital pen or a tablet computer. On the other hand side, offline KWS is based on scanned image only, and thus, offline KWS is regarded as the more difficult task than its online counterpart. The focus of this paper is on KWS in historical handwritten documents. Therefore, offline KWS, referred to as KWS from now on, can be applied only. Most of the KWS methodologies available are either based on templatebased or learning-based matching algorithms. Early approaches of templatebased KWS are based on a pixel-by-pixel matching of word images [1]. More elaborated approaches to template-based KWS are based on the matching of feature c Springer International Publishing AG 2016 A. Robles-Kelly et al. (Eds.): S+SSPR 2016, LNCS 10029, pp. 564–573, 2016. DOI: 10.1007/978-3-319-49055-7 50
Graph-Based Keyword Spotting in Historical Handwritten Documents
565
vectors by means of Dynamic Time Warping (DTW) [4]. A recent and promising approach to template-based KWS is given by the matching of Local Binary Pattern (LBP) histograms [5]. One of the main advantages of template-based KWS is its independence from the actual representation formalism as well as the underlying language (and alphabet) of the document. However, template-based KWS does not generalise well to different writing styles. Learning-based KWS on the other side is based on statistical models like Hidden Markov Models (HMM) [6,7], Neural Networks (NN) [3] o
Data Loading...