Graph-Based Keyword Spotting in Historical Handwritten Documents

The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means o

PDF / 418,061 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
5 Downloads / 208 Views

DOWNLOAD

REPORT

Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Riggenbachstr. 16, 4600 Olten, Switzerland {michael.stauffer,kaspar.riesen}@fhnw.ch 2 University of Fribourg and HES-SO, 1700 Fribourg, Switzerland [email protected] 3 Department of Informatics, University of Pretoria, Pretoria, South Africa

Abstract. The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means of a novel method for keyword spotting in ancient handwritten documents. The proposed system relies on a keypoint-based graph representation for individual words. Keypoints are characteristic points in a word image that are represented by nodes, while edges are employed to represent strokes between two keypoints. The basic task of keyword spotting is then conducted by a recent approximation algorithm for graph edit distance. The novel framework for graph-based keyword spotting is tested on the George Washington dataset on which a state-of-the-art reference system is clearly outperformed. Keywords: Handwritten keyword spotting · Bipartite graph matching · Graph representation for words

1

Introduction

Keyword Spotting (KWS) is the task of retrieving any instance of a given query word in speech recordings or text images [1–3]. Textual KWS can be roughly divided into online and oﬄine KWS. For online KWS temporal information of the handwriting is available recorded by an electronic input device such as, for instance, a digital pen or a tablet computer. On the other hand side, oﬄine KWS is based on scanned image only, and thus, oﬄine KWS is regarded as the more diﬃcult task than its online counterpart. The focus of this paper is on KWS in historical handwritten documents. Therefore, oﬄine KWS, referred to as KWS from now on, can be applied only. Most of the KWS methodologies available are either based on templatebased or learning-based matching algorithms. Early approaches of templatebased KWS are based on a pixel-by-pixel matching of word images [1]. More elaborated approaches to template-based KWS are based on the matching of feature c Springer International Publishing AG 2016 A. Robles-Kelly et al. (Eds.): S+SSPR 2016, LNCS 10029, pp. 564–573, 2016. DOI: 10.1007/978-3-319-49055-7 50

Graph-Based Keyword Spotting in Historical Handwritten Documents

565

vectors by means of Dynamic Time Warping (DTW) [4]. A recent and promising approach to template-based KWS is given by the matching of Local Binary Pattern (LBP) histograms [5]. One of the main advantages of template-based KWS is its independence from the actual representation formalism as well as the underlying language (and alphabet) of the document. However, template-based KWS does not generalise well to diﬀerent writing styles. Learning-based KWS on the other side is based on statistical models like Hidden Markov Models (HMM) [6,7], Neural Networks (NN) [3] o

Data Loading...

Graph-Based Keyword Spotting in Historical Handwritten Documents

Recommend Documents

Image Based Retrieval and Keyword Spotting in Documents

Zone-based keyword spotting in Bangla and Devanagari documents

Keyword Spotting Methods

Keyword Spotting Out of Continuous Speech

Automating Stress Detection from Handwritten Documents

A Robust Approach to Plagiarism Detection in Handwritten Documents

SpSiSb: The Technique to Identify Forgery in Legal Handwritten Documents

A Hybrid Representation of Word Images for Keyword Spotting

Gender Detection from Handwritten Documents Using Concept of Transfer-Learning

ARDIS: a Swedish historical handwritten digit dataset

Very Fast Keyword Spotting System with Real Time Factor Below 0.01

Entity Linking for Historical Documents: Challenges and Solutions