Downtown Osaka Scene Text Dataset
This paper presents a new scene text dataset named Downtown Osaka Scene Text Dataset (in short, DOST dataset). The dataset consists of sequential images captured in shopping streets in downtown Osaka with an omnidirectional camera. Unlike most of existing
- PDF / 18,712,529 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 109 Downloads / 425 Views
		    Abstract. This paper presents a new scene text dataset named Downtown Osaka Scene Text Dataset (in short, DOST dataset). The dataset consists of sequential images captured in shopping streets in downtown Osaka with an omnidirectional camera. Unlike most of existing datasets consisting of scene images intentionally captured, DOST dataset consists of uncontrolled scene images; use of an omnidirectional camera enabled us to capture videos (sequential images) of whole scenes surrounding the camera. Since the dataset preserved the real scenes containing texts as they were, in other words, they are scene texts in the wild. DOST dataset contained 32,147 manually ground truthed sequential images. They contained 935,601 text regions consisting of 797,919 legible and 137,682 illegible. The legible regions contained 2,808,340 characters. The dataset is evaluated using two existing scene text detection methods and one powerful commercial end-to-end scene text recognition method to know the difficulty and quality in comparison with existing datasets. Keywords: Scene text in the wild · Uncontrolled scene text rectional camera · Sequential image · Video · Japanese text
 
 1
 
 · Omnidi-
 
 Introduction
 
 Text plays important roles in our life. Imagining life in a world without text, in which, for example, neither book, newspaper, signboard, menu in a restaurant, texting on smartphone nor program source code exists or they exist in a completely different form, we can rediscovery not only the necessity of text but also importance of reading and interpreting text. Although only human being has been endowed with the ability of reading and interpreting text, researchers have struggled to enable computers to read text. Focusing on camera-captured text and scene text, some pioneer works were presented in 1990s [21]. Since then, increasing attention was paid for recognizing scene text. Table 1 shows remarkable recent progress of scene text recognition techniques. In the table, most of reported accuracies of the latest methods exceeded 90 % on major benchmark datasets. However, does this mean these methods are powerful enough to read a variety of texts in the real environment? Many people would agree that the answer is no. Text images contained in these c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part I, LNCS 9913, pp. 440–455, 2016. DOI: 10.1007/978-3-319-46604-0 32
 
 Downtown Osaka Scene Text Dataset
 
 441
 
 Table 1. Recent improvement of recognition performance in scene text recognition tasks. Based on Table 1 of [1], this table summarizes recognition accuracies of recent methods in percentage terms on representative benchmark datasets in the chronological order. “50,” “1k” and “50k” represent lexicon sizes. “Full” and “None” represent with all per-image lexicon words and without lexicon, respectively. Year
 
 Method
 
 IIIT5K [2]
 
 Lexicon
 
 50
 
 -
 
 ABBYY [3]
 
 2011
 
 Wang et al. [3]
 
 2012
 
 2013 2014
 
 2015
 
 2016
 
 SVT [3]
 
 ICDAR03 [4]
 
 None
 
 50
 
 50
 
 24.3 -
 
 -
 
 35.0 -
 
 -
 
 -
 
 57.0 -
 
 Mishra et al.		
Data Loading...
 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	