Robust Segmentation for Video Captions with Complex Backgrounds

Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an effective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is first trained with path signature for

PDF / 1,355,385 Bytes
12 Pages / 439.37 x 666.14 pts Page_size
4 Downloads / 208 Views

DOWNLOAD

REPORT

(

)

Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China [email protected], [email protected], [email protected], [email protected]

Abstract. Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an eﬀective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is ﬁrst trained with path signature for text tracking. Then we utilize an improved adaptive thresholding method to segment caption text in individual frames. Finally, the multi-frame integration is conducted with gamma correction and region growing. In contrast to conventional methods which extract video text in individual frames independently, we exploit the speciﬁc temporal characteristics of videos to perform segmentation. Moreover, the proposed method can eﬀectively remove the complex backgrounds with similar intensity to text. Experimental results on diﬀerent videos and comparisons with other methods show the eﬃciency of our approach. Keywords: Caption text segmentation · Convolutional neural networks · Path signature · Multi-frame integration

1

Introduction

As the technology of digital multimedia develops rapidly, video has become one of the most popular media forms delivered via TV broadcasting and Internet. Text in video, especially caption text, contains rich information as a signiﬁcant high-level semantic feature, which directly describes the video content. For instance, scores in sports program can help you quickly know the game situation, headlines in news videos summarize the content of news and captions in ﬁlms conduce to better understanding of the storyline. Thus, caption text extraction and recognition play an important role in video indexing [1] and summarization. However, the state-of-the-art Optical Character Recognition (OCR) does not work well for text in video, although it has achieved excel‐ lent performance on printed documents. In contrast to binary document images, video text presents much more diﬃculties due to complex backgrounds, variety of text fonts and low contrast caused by the lossy compression. Before text recognition is implemented, there are generally three phases involved: text detection, text tracking and text segmentation. Text detection aims for locating the © Springer Nature Singapore Pte Ltd. 2016 T. Tan et al. (Eds.): CCPR 2016, Part II, CCIS 663, pp. 89–100, 2016. DOI: 10.1007/978-981-10-3005-5_8

90

Z.-H. Xing et al.

text region within a single video frame, while text tracking maintains the consistence of the text between consecutive frames and text segmentation intends to separate text from background (i.e., binarization). After segmentation, we can obtain a binary image, i.e., a mask, which is more adaptable for OCR engines. Although many methods have been proposed for video text segmentation in the past decades, most of them extract text only in individual frames independently. Even if

Data Loading...

Robust Segmentation for Video Captions with Complex Backgrounds

Recommend Documents

Video Segmentation

Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Video Analysis and Coding for Robust Transmission

Robust Speaking Face Identification for Video Analysis

Improved Image Boundaries for Better Video Segmentation

Clockwork Convnets for Video Semantic Segmentation

Kernelized Memory Network for Video Object Segmentation

Robust Layer Segmentation Against Complex Retinal Abnormalities for en face OCTA Generation

Video Segmentation and Its Applications

Compressed Video Spatio-Temporal Segmentation

Efficient Semantic Video Segmentation with Per-Frame Inference

Video Object Segmentation with Episodic Graph Memory Networks