Single shot multi-oriented text detection based on local and non-local features

  • PDF / 2,597,815 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 47 Downloads / 220 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Single shot multi-oriented text detection based on local and non-local features XiaoQian Li1,2 · Jie Liu1,3 · ShuWu Zhang1,2,3 · GuiXuan Zhang1,3 · Yang Zheng1 Received: 27 June 2019 / Revised: 3 April 2020 / Accepted: 14 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In order to improve the robustness of text detector on scene text of various scales, a single shot text detector that combines local and non-local features is proposed in this paper. A dilated inception module for local feature extraction and a text self-attention module for non-local feature extraction are presented, and these two kinds of modules are integrated into single shot detector (SSD) of generic object detection so as to perform multi-oriented text detection in natural scene. The proposed modules make a contribution to richer and wider receptive field and enhance feature representation. Furthermore, the performance of our text detector is improved. In addition, compared with previous text detectors based on SSD which classify positive and negative samples depending on default boxes, we exploit pixels as reference for more accurate matching with ground truth which avoids complex anchor design. Furthermore, to evaluate the effectiveness of the proposed method, we carry out several comparative experiments on public standard benchmarks and analyze the experimental results in detail. The experimental results illustrate that the proposed text detector can compete with the state-of-the-art methods. Keywords Text detection · Natural scene text · Convolutional neural network · Attention mechanism

1 Introduction With the rapid development of computer science and the widespread popularity of the Internet and smartphones, it has gradually become a way of life for people to use cameras of mobile devices to obtain and share information. Naturally, the text is an important clue to assist people to understand images or videos. Thus, text detection has attracted much research

B

Yang Zheng [email protected] XiaoQian Li [email protected] Jie Liu [email protected] ShuWu Zhang [email protected] GuiXuan Zhang [email protected]

1

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

2

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China

3

AICFVE, Beijing Film Academy, Beijing 100088, China

interest in computer vision. The research on natural scene text detection technology is not only of great theoretical significance to computer vision, but also has a brilliant prospect in such fields as content-based image/video retrieval, intelligent translation, and card identification. The main difficulties in text detection are as follows, (1) the randomness of background and background like text, (2) the diversity of textual font, textual language, textual orientation, text shape and so on, (3) images obtained under poor shooting conditions or occluded text. Therefore, it is vital to find a detector with robust feature representatio