Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling

An adaptive image sampling framework is proposed for identifying text regions in natural scene images. A small fraction of the pixels actually correspond to text regions. It is desirable to eliminate non-text regions at the early stages of text detection.

  • PDF / 1,843,692 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 113 Downloads / 211 Views

DOWNLOAD

REPORT


Abstract. An adaptive image sampling framework is proposed for identifying text regions in natural scene images. A small fraction of the pixels actually correspond to text regions. It is desirable to eliminate non-text regions at the early stages of text detection. First, the image is sampled row-by-row at a specific rate and each row is tested for containing text using an 1D adaptation of the Maximally Stable Extremal Regions (MSER) algorithm. The surrounding rows of the image are recursively sampled at finer rates to fully contain the text. The adaptive sampling process is performed on the vertical dimension as well for the identified regions. The final output is a binary mask which can be used for text detection and/or recognition purposes. The experiments on the ICDAR’03 dataset show that the proposed approach is up to 7x faster than the MSER baseline on a single CPU core with comparable text localization scores. The approach is inherently parallelizable for further speed improvements. Keywords: Adaptive image sampling · Scene text detection · 1D maximally stable extremal regions (1D MSER) · Mobile applications

1

Introduction

Recent advances in digital imaging technology enable users to take high quality digital pictures and videos in their natural environments. One can use these images and videos for automating various tasks such as product search, automatic navigation, license plate detection and recognition, surveillance and helping elderly or disabled people to recognize their environment. The existence of text in scene images provides valuable information about the content of the image. The research question is how to effectively detect and recognize text in scene images and perform it in real time using mobile devices. Commercial Optical Character Recognition (OCR) systems are reasonably accurate (i.e., over 95 %) for recognizing text in document images [23]. However, text detection and recognition accuracies are generally much lower for natural scene images. In ICDAR’15, most methods performed below 40 % with the exception of “AJOU” [10] and “Stradvision-1”. Both of these methods were based on variants of the MSER algorithm followed by different grouping approaches [8]. c Springer International Publishing Switzerland 2016  G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part I, LNCS 9913, pp. 427–439, 2016. DOI: 10.1007/978-3-319-46604-0 31

428

I.Z. Yalniz et al.

(a) Low color con- (b) Illumination ef- (c) Unconventional trast fects text layout Fig. 1. Example scene images from the ICDAR’03 dataset ([12]).

Image blur, low resolution, low contrast, unconventional text layout, non-uniform background, lighting and perspective changes are among the factors which makes the problem challenging as seen in Fig. 1. The most common approach for recognizing text in scene images is to localize each word and/or character in the input image and then classify each one of them independently [3,22]. In these approaches, the performance of the overall text recognition framework heavily depends on the success of the te