Scene Text Detection and Recognition: The Deep Learning Era

PDF / 2,454,398 Bytes
24 Pages / 595.276 x 790.866 pts Page_size
35 Downloads / 416 Views

Scene Text Detection and Recognition: The Deep Learning Era Shangbang Long1

· Xin He2 · Cong Yao3

Received: 14 April 2020 / Accepted: 8 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository (https://github.com/Jyouhou/SceneTextPapers). Keywords Scene text · Optical character recognition · Detection · Recognition · Deep learning · Survey

1 Introduction Undoubtedly, text is among the most brilliant and influential creations of humankind. As the written form of human languages, text makes it feasible to reliably and effectively spread or acquire information across time and space. In this sense, text constitutes the cornerstone of human civilization. On the one hand, as a vital tool for communication and collaboration, text has been playing a more important role than ever in modern society; on the other hand, the rich and precise high-level semantics embodied in text could be beneficial for understanding the world around us. For example, text information can be used in a wide range of real-world applications, such as image search (Tsai et al. 2011; Schroth Communicated by Vittorio Ferrari.

B

Shangbang Long [email protected] Xin He [email protected] Cong Yao [email protected]

1

Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA

2

ByteDance Ltd, Beijing, China

3

MEGVII Inc. (Face++), Beijing, China

et al. 2011), instant translation (Dvorin and Havosha 2009; Parkinson et al. 2016), robots navigation (DeSouza and Kak 2002; Liu and Samarabandu 2005a, b; Schulz et al. 2015), and industrial automation (Ham et al. 1995; He et al. 2005; Chowdhury and Deb 2013). Therefore, automatic text reading from natural environments, as illustrated in Fig. 1, a.k.a. scene text detection and recognition (Zhu et al. 2016) or PhotoOCR (Bissacco et al. 2013), has become an increasing popular and important research topic in computer vision. However, despite years of research, a series of grand challenges may still be encountered when detecting and re

Data Loading...

Scene Text Detection and Recognition: The Deep Learning Era

Recommend Documents

Scene Text Recognition Based on Deep Learning

Deep Learning for Scene Recognition from Visual Data: A Survey

Scene Text Detection with Adaptive Line Clustering

Sequential Deformation for Accurate Scene Text Detection

Interactive Scene Text Detection on Mobile Devices

Class-Balanced Loss for Scene Text Detection

New Era for Robust Speech Recognition Exploiting Deep Learning

Urban Scene Recognition via Deep Network Integration

Scene Text Recognition and Retrieval for Large Lexicons

Journey of scene text components recognition: Progress and open issues

Path Aggregation and Dual Supervision Network for Scene Text Detection

AutoSTR: Efficient Backbone Search for Scene Text Recognition