Detecting Text in Natural Image with Connectionist Text Proposal Network

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anch

PDF / 2,666,354 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
111 Downloads / 367 Views

DOWNLOAD

REPORT

Shenzhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 2 University of Oxford, Oxford, UK 3 The Chinese University of Hong Kong, Sha Tin, Hong Kong {zhi.tian,wl.huang,tong.he,pan.he,yu.qiao}@siat.ac.cn

Abstract. We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of ﬁne-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each ﬁxed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multilanguage text without further post-processing, departing from previous bottom-up methods requiring multi-step post ﬁltering. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpassing recent results [8, 35] by a large margin. The CTPN is computationally eﬃcient with 0.14 s/image, by using the very deep VGG16 model [27]. Online demo is available: http://textdet.com/. Keywords: Scene text detection · Convolutional network neural network · Anchor mechanism

1

·

Recurrent

Introduction

Reading text in natural image has recently attracted increasing attention in computer vision [1,8–11,14,15,28,32,35]. This is due to its numerous practical applications such as image OCR, multi-language translation, image retrieval, etc. It includes two sub tasks: text detection and recognition. This work focus on the detection task [1,14,28,32], which is more challenging than recognition task carried out on a well-cropped word image [9,15]. Large variance of text patterns and highly cluttered background pose main challenge of accurate text localization. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 56–72, 2016. DOI: 10.1007/978-3-319-46484-8 4

Detecting Text in Natural Image with CTPN

(a)

57

(b)

Fig. 1. (a) Architecture of the Connectionist Text Proposal Network (CTPN). We densely slide a 3×3 spatial window through the last convolutional maps (conv5 ) of the VGG16 model [27]. The sequential windows in each row are recurrently connected by a Bi-directional LSTM (BLSTM) [7], where the convolutional feature (3×3×C) of each window is used as input of the 256D BLSTM (including two 128D LSTMs). The RNN layer is connected to a 512D fully-connected layer, followed by the output layer, which jointly predicts text/non-text scores, y-axis coordinates and side-reﬁnement oﬀsets of k anchors. (b) The CTPN outputs sequential ﬁxed-width ﬁne-scale text proposals. Color of each box indicates the text/non-text score.

Data Loading...

Detecting Text in Natural Image with Connectionist Text Proposal Network

Recommend Documents

Automated Text Detection and Text-Line Construction in Natural Images

Text-based Image Retrieval

Growth of $${\text{N}}{{{\text{d}}}_{{{\text{1}}\; - \;y}}}{\text{Eu}}_{y}^{{{\text{2}} + }}{{{\text{F}}}_{{{\text{3}}\;

Working with Text and Around Text in Foreign Language Environments

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Natural Language Processing (NLP) and Text Analytics

A New Method for Detecting Altered Text in Document Images

UNITER: UNiversal Image-TExt Representation Learning

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling

Image and Text in Conceptual Art Critical Operations in Context

Anchor-free multi-orientation text detection in natural scene images