Scene Text Detection and Recognition: The Deep Learning Era

  • PDF / 2,454,398 Bytes
  • 24 Pages / 595.276 x 790.866 pts Page_size
  • 35 Downloads / 236 Views

DOWNLOAD

REPORT


Scene Text Detection and Recognition: The Deep Learning Era Shangbang Long1

· Xin He2 · Cong Yao3

Received: 14 April 2020 / Accepted: 8 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository (https://github.com/Jyouhou/SceneTextPapers). Keywords Scene text · Optical character recognition · Detection · Recognition · Deep learning · Survey

1 Introduction Undoubtedly, text is among the most brilliant and influential creations of humankind. As the written form of human languages, text makes it feasible to reliably and effectively spread or acquire information across time and space. In this sense, text constitutes the cornerstone of human civilization. On the one hand, as a vital tool for communication and collaboration, text has been playing a more important role than ever in modern society; on the other hand, the rich and precise high-level semantics embodied in text could be beneficial for understanding the world around us. For example, text information can be used in a wide range of real-world applications, such as image search (Tsai et al. 2011; Schroth Communicated by Vittorio Ferrari.

B

Shangbang Long [email protected] Xin He [email protected] Cong Yao [email protected]

1

Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA

2

ByteDance Ltd, Beijing, China

3

MEGVII Inc. (Face++), Beijing, China

et al. 2011), instant translation (Dvorin and Havosha 2009; Parkinson et al. 2016), robots navigation (DeSouza and Kak 2002; Liu and Samarabandu 2005a, b; Schulz et al. 2015), and industrial automation (Ham et al. 1995; He et al. 2005; Chowdhury and Deb 2013). Therefore, automatic text reading from natural environments, as illustrated in Fig. 1, a.k.a. scene text detection and recognition (Zhu et al. 2016) or PhotoOCR (Bissacco et al. 2013), has become an increasing popular and important research topic in computer vision. However, despite years of research, a series of grand challenges may still be encountered when detecting and re

Data Loading...