Scene Text Recognition and Retrieval for Large Lexicons

In this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-specific list of words, known as the small lexicon setting, is unava

PDF / 717,650 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
67 Downloads / 221 Views

DOWNLOAD

REPORT

2

CVIT, IIIT Hyderabad, Hyderabad, India [email protected] Inria, LEAR team, Inria Grenoble Rhˆ one-Alpes, Laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, Saint-Martin-d’H´eres, France

Abstract. In this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-speciﬁc list of words, known as the small lexicon setting, is unavailable. We present a conditional random ﬁeld model deﬁned on potential character locations and the interactions between them. Observing that the interaction potentials computed in the large lexicon setting are less eﬀective than in the case of a small lexicon, we propose an iterative method, which alternates between ﬁnding the most likely solution and reﬁning the interaction potentials. We evaluate our method on public datasets and show that it improves over baseline and state-of-the-art approaches. For example, we obtain nearly 15 % improvement in recognition accuracy and precision for our retrieval task over baseline methods on the IIIT-5K word dataset, with a large lexicon containing 0.5 million words.

1

Introduction

Text can play an important role in understanding street view images. In light of this, many attempts have been made to recognize scene text [1–6]. Scene text recognition is a challenging problem and its recent success is mostly limited to the small lexicon setting, where an image-speciﬁc lexicon containing the ground truth word is provided. Typically, these lexicons contain only 50 words [3]. This setting has many practical applications, but it does not scale well. As an example consider the scenario of assisting visually-impaired people in ﬁnding books by their titles in a library. Here the lexicon is populated with all the book titles. In this case, the small lexicon setting becomes less accurate as the lexicon sizes can range from a few thousands to a million. For instance, when lexicon size increases from 50 to 1000, the recognition accuracy drops by more than 10 % [6,7]. In other words, the general problem of scene text recognition, i.e., recognition with the help of a large lexicon (say a million dictionary words) is far from being solved. In this paper, we investigate this problem. One way to address the task of recognizing scene text is to pose the problem in conditional random ﬁeld (crf) framework and obtain the maximum a posteriori (map) solution as proposed in [3,4,7–10]. In these frameworks, c Springer International Publishing Switzerland 2015 D. Cremers et al. (Eds.): ACCV 2014, Part I, LNCS 9003, pp. 494–508, 2015. DOI: 10.1007/978-3-319-16865-4 32

Scene Text Recognition and Retrieval for Large Lexicons Word Image

495

Top-5 diverse solutions (ranked) PITA, PASP, ENEP, PITT, AWAP AUM, NIM, COM, MUA, PLL MINSTER, MINSHER, GRINNER, MINISTR, MONSTER BRKE, BNKE, BIKE, BAKE, BOKE TOLS, TARS, THIS, TOHE, TALP

Fig. 1. Examples where the map solution is incorrect, as the pairwise priors become too generic when computed from

Data Loading...

Scene Text Recognition and Retrieval for Large Lexicons

Recommend Documents

Scene Text Recognition Based on Deep Learning

AutoSTR: Efficient Backbone Search for Scene Text Recognition

Scene Text Detection and Recognition: The Deep Learning Era

Journey of scene text components recognition: Progress and open issues

Review on Text Recognition in Natural Scene Images

Accurate Scene Text Recognition Based on Recurrent Neural Network

Text Indexing and Retrieval

Text Retrieval

Retrieval Models for Text Databases

Relevance Feedback for Text Retrieval

Sequential Deformation for Accurate Scene Text Detection

Class-Balanced Loss for Scene Text Detection