Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

  • PDF / 6,951,521 Bytes
  • 21 Pages / 595.276 x 790.866 pts Page_size
  • 40 Downloads / 225 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH PAPER

Real‑time localization of multi‑oriented text in natural scene images using a linear spatial filter Xavier Gironés1   · Carme Julià1 Received: 26 April 2019 / Accepted: 5 September 2019 © Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract This paper proposes a multi-oriented text localization method in natural images suitable for real-time processing of highdefinition video on portable and mobile devices. Our method is based on the connected components (CC) approach: first, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more context aware neural network classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the work presented here is execution speed: the CPU-only parallel implementation of the proposed method is capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. Furthermore, when benchmarked on the ICDAR 2013 Robust Reading and on the ICDAR 2015 Incidental Scene Text data sets, our system performs more than twice faster than the state-of-the-art, while still delivering competitive results in terms of precision and recall. Keywords  Text detection · Real-time · Multi-oriented · Linear spatial filter

1 Introduction While the topic of optical character recognition (OCR) on scanned documents has been intensively studied during the past decades and has attained a degree of maturity, the problem of text detection and recognition in natural scene images, text spotting, still remains a challenge. Factors present in natural images such as background clutter, occlusions, poor lighting conditions, shadows, perspective distortion, blurring, variations in font, scale, and orientation make the task of text spotting more difficult than the typical OCR operation [1]. Text in natural scenes carries semantic information that can be used in a number of applications like license plate recognition [2], automated street sign translation [3], automatic indexing of video data [4], or help for

* Xavier Gironés [email protected] Carme Julià [email protected] 1



Universitat Rovira i Virgili, Tarragona, Spain

the visually impaired [5], to name a few. Furthermore, the range of possible applications is growing rapidly due to the increasing availability of high-performance mobile devices equipped with cameras, such as smartphones and tablet computers. Traditionally, there have been two main categories of text detection methods described in the literature. On the one hand, texture based methods