Dynamic Lexicon Generation for Natural Scene Images

Many scene text understanding methods approach the end-to-end recognition problem from a word-spotting perspective and take huge benefit from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely

PDF / 3,461,520 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
101 Downloads / 315 Views

DOWNLOAD

REPORT

2

CVIT IIIT, Hyderabad, India [email protected] Computer Vision Center, Universitat Aut` onoma de Barcelona, Barcelona, Spain {lgomez,marcal,dimos}@cvc.uab.es

Abstract. Many scene text understanding methods approach the endto-end recognition problem from a word-spotting perspective and take huge beneﬁt from using small per-image lexicons. Such customized lexicons are normally assumed as given and their source is rarely discussed. In this paper we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. Using the topic modeling framework to discover a set of latent topics in such a dataset allows us to re-rank a ﬁxed dictionary in a way that prioritizes the words that are more likely to appear in a given image. Moreover, we train a CNN that is able to reproduce those word rankings but using only the image raw pixels as input. We demonstrate that the quality of the automatically obtained custom lexicons is superior to a generic frequency-based baseline. Keywords: Scene text · Photo OCR generation · Topic modeling · CNN

1

· Scene understanding · Lexicon

Introduction

Reading systems for text understanding in the wild have shown a remarkable increase in performance over the past ﬁve years [1,2]. However, the problem is still far from being considered solved with the best reported methods achieving end-to-end recognition performances of 87 % in focused text scenarios [3,4] and 53% in the more diﬃcult problem of incidental text [5]. The best performing end-to-end scene text understanding methodologies address the problem from a word spotting perspective and take a huge beneﬁt from using customized lexicons. The size and quality of these custom lexicons has been shown to have a strong eﬀect in the recognition performance [6]. The source of such per-image customized lexicons is rarely discussed. In most academic settings such custom lexicons are artiﬁcially created and provided to the algorithm as a form of predeﬁned word queries. But, in real life scenarios lexicons need to be dynamically constructed. c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part I, LNCS 9913, pp. 395–410, 2016. DOI: 10.1007/978-3-319-46604-0 29

396

Y. Patel et al.

In one of the few examples in literature, Wang et al. [7] used Google’s “search nearby” functionality to built custom lexicons of businesses that might appear in Google Street View images. In the document analysis domain, diﬀerent techniques for adapting the language models to take into account the context of the document have been used, such as language model adaptation [8] and full-book recognition techniques [9]. Such approaches are nevertheless only feasible on relatively large corpuses were word statistics can be eﬀectively calculated and are not applicable to scene images where text is scarce. On the other han

Data Loading...

Dynamic Lexicon Generation for Natural Scene Images

Recommend Documents

Anchor-free multi-orientation text detection in natural scene images

Review on Text Recognition in Natural Scene Images

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling

Dynamic Generation

Structure-Aware Generation Network for Recipe Generation from Images

A Deep Learning Generative Approach for Speech-to-Scene Generation

Learning Canonical Representations for Scene Graph to Image Generation

The Automatic Generation of NooJ Dictionaries from Lexicon-Grammar Tables

Natural Language Generation Systems