Anchor-free multi-orientation text detection in natural scene images

  • PDF / 2,180,875 Bytes
  • 15 Pages / 595.224 x 790.955 pts Page_size
  • 72 Downloads / 215 Views

DOWNLOAD

REPORT


Anchor-free multi-orientation text detection in natural scene images Liqiong Lu1 · Dong Wu1 · Tao Wu1 · Faliang Huang2 · Yaohua Yi3

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Text detection in natural scene images is a key prerequisite for computer vision tasks such as image search, blind navigation, autopilot, and multi-language translation. Existing text detection methods only detect partial region of large-scale texts and are difficult to detect small-scale texts. Aiming at this problem, an anchor-free multi-orientation text detection method is proposed. Firstly, Feature Pyramid Network (FPN) is used to combine the multiple feature layers of Convolutional Neural Network (CNN) to predict the geometric properties of text, which can be used to expand the receptive field of each pixel and thus help to detect more large-scale texts. Secondly, a new loss function independent of the scale of text is designed, which enables the pixels in the small-scale text to have a larger calculation weight, thereby facilitating the detection of small-scale texts. Finally, the results of pixel-level semantic segmentation are used to filter obviously unreasonable candidate text boxes, and at the same time improve the accuracy and recall rate of text detection. The experimental results on ICDAR 2015 and MSRA-TD500 prove the good performance of our method. Keywords Text detection · Natural scene image · Anchor-free · Convolutional Neural Network

1 Introduction Scene text images such as billboards in street view images, house numbers on streets, signs at intersections, and traffic signs on highways contain a large amount of  Yaohua Yi

[email protected]  Faliang Huang

[email protected] Liqiong Lu [email protected] Dong Wu hb [email protected] Tao Wu [email protected] 1

School of Information Engineering, Lingnan Normal University, Zhanjiang, 524048, China

2

School of Computer and Information Engineering, Nanning Normal University, Nanning, 530001, China

3

School of Printing and Packaging, Wuhan University, Wuhan, 430072, China

text information with clear semantics, which is the key clue to describe and understand the content of the scene images. Effectively identifying this textual information is very crucial to some computer vision tasks such as text mining, intelligent navigation etc [1]. However, before recognizing the content or script type of texts in the image [2], locating texts in the image is essential. Text detection is the technology to detect the location of text in images. It has become a hot research in the field of computer vision. Text detection in natural scene images is mainly challenged by three aspects: (1) the diversity of the scene text: the text in the natural scene image may have completely different fonts, colors, scales and directions; (2) the complexity of the background: the background in a natural scene can be very complex, elements such as signs, fences, bricks and grass are almost indistinguishable from the text, which can easily cause confusion and errors; (

Data Loading...