Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

PDF / 6,951,521 Bytes
21 Pages / 595.276 x 790.866 pts Page_size
40 Downloads / 260 Views

ORIGINAL RESEARCH PAPER

Real‑time localization of multi‑oriented text in natural scene images using a linear spatial filter Xavier Gironés1 · Carme Julià1 Received: 26 April 2019 / Accepted: 5 September 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract This paper proposes a multi-oriented text localization method in natural images suitable for real-time processing of highdefinition video on portable and mobile devices. Our method is based on the connected components (CC) approach: first, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more context aware neural network classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the work presented here is execution speed: the CPU-only parallel implementation of the proposed method is capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. Furthermore, when benchmarked on the ICDAR 2013 Robust Reading and on the ICDAR 2015 Incidental Scene Text data sets, our system performs more than twice faster than the state-of-the-art, while still delivering competitive results in terms of precision and recall. Keywords Text detection · Real-time · Multi-oriented · Linear spatial filter

1 Introduction While the topic of optical character recognition (OCR) on scanned documents has been intensively studied during the past decades and has attained a degree of maturity, the problem of text detection and recognition in natural scene images, text spotting, still remains a challenge. Factors present in natural images such as background clutter, occlusions, poor lighting conditions, shadows, perspective distortion, blurring, variations in font, scale, and orientation make the task of text spotting more difficult than the typical OCR operation [1]. Text in natural scenes carries semantic information that can be used in a number of applications like license plate recognition [2], automated street sign translation [3], automatic indexing of video data [4], or help for

* Xavier Gironés [email protected] Carme Julià [email protected] 1

Universitat Rovira i Virgili, Tarragona, Spain

the visually impaired [5], to name a few. Furthermore, the range of possible applications is growing rapidly due to the increasing availability of high-performance mobile devices equipped with cameras, such as smartphones and tablet computers. Traditionally, there have been two main categories of text detection methods described in the literature. On the one hand, texture based methods

Data Loading...

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

Recommend Documents

Anchor-free multi-orientation text detection in natural scene images

Review on Text Recognition in Natural Scene Images

Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling

Automatic Text Localization in Scene Images: A Transfer Learning Based Approach

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Dynamic Lexicon Generation for Natural Scene Images

Automated Text Detection and Text-Line Construction in Natural Images

Text Localization and Recognition in Images and Video

Downtown Osaka Scene Text Dataset

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Environment Scene Classification Based on Images Using Bag-of-Words

Discrimination of Text and Non-text Images