Matching Handwritten Document Images

We address the problem of predicting similarity between a pair of handwritten document images written by potentially different individuals. This has applications related to matching and mining in image collections containing handwritten content. A similar

PDF / 3,016,168 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
100 Downloads / 203 Views

DOWNLOAD

REPORT

Abstract. We address the problem of predicting similarity between a pair of handwritten document images written by potentially diﬀerent individuals. This has applications related to matching and mining in image collections containing handwritten content. A similarity score is computed by detecting patterns of text re-usages between document images irrespective of the minor variations in word morphology, word ordering, layout and paraphrasing of the content. Our method does not depend on an accurate segmentation of words and lines. We formulate the document matching problem as a structured comparison of the word distributions across two document images. To match two word images, we propose a convolutional neural network (cnn) based feature descriptor. Performance of this representation surpasses the state-of-the-art on handwritten word spotting. Finally, we demonstrate the applicability of our method on a practical problem of matching handwritten assignments.

Keywords: Handwritten word spotting detection

1

·

cnn features, plagiarism

Introduction

Matching two document images has several applications related to information retrieval like spotting keywords in historical documents [8], accessing personal notes [22], camera based interface for querying [45], retrieving from video databases [27], automatic scoring of answer sheets [40], and mining and recommending in health care documents [25]. Since ocrs do not reliably work for all types of documents, one resorts to image based methods for comparing textual content. This problem is even more complex when considering unconstrained handwritten documents due to the high variations across the writers. Moreover, variable placement of the words across the documents makes a rigid geometric matching ineﬀective. In this work, we design a scheme for matching two handwritten document images. The problem is illustrated in Fig. 1(a). We validate the eﬀectiveness of our method on an application, named as measure of document similarity (mods).1 mods compares two handwritten document images and provides a normalized score as a measure of similarity between two images. 1

In parallel to measure of software similarity (moss) [36], which has emerged as the de facto standard across the universities to compare software solutions from students.

c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 766–782, 2016. DOI: 10.1007/978-3-319-46448-0 46

Matching Handwritten Document Images

767

Fig. 1. (a) Given two document images Di and Dj , we are interested in computing a similarity score S(Di , Dj ) which is invariant to (i) writers, (ii) word ﬂow across lines, (iii) spatial shifts, and (iv) paraphrasing. In this example, the highlighted lines from Di and Dj have almost the same content but they widely diﬀer in terms of spatial arrangement of words. (b) Query-by-text results on searching with “satellite” on an instructional video. The spotted results are highlighted in the frame.

Text is now appreciated as a critical information in un

Data Loading...

Matching Handwritten Document Images

Recommend Documents

Low Pass Filter-Based Enhancement of Arabic Handwritten Document Images

Term-Document Matching Function

Unsupervised Labelling of Stolen Handwritten Digit Embeddings with Density Matching

A Novel Graph Database for Handwritten Word Images

A New Method for Detecting Altered Text in Document Images

An automatic histogram detection and information extraction from document images

DetectGAN: GAN-based text detector for camera-captured document images

Line and word segmentation of handwritten text document by mid-point detection and gap trailing

Document

Document Creation, Image Acquisition and Document Quality

How Document Properties Affect Document Relatedness Measures

User Identification Using Images of the Handwritten Characters Based on Cellular Automata and Radon Transform