Caption Text Extraction from Color Image Based on Differential Operation and Morphological Processing

With the continuous progress and development of multimedia technology, it becomes more and more valuable to extract text from the image. The text image can be divided into document text image, caption text image, and scene text image due to the different

  • PDF / 332,098 Bytes
  • 8 Pages / 439.37 x 666.142 pts Page_size
  • 80 Downloads / 127 Views

DOWNLOAD

REPORT


Abstract With the continuous progress and development of multimedia technology, it becomes more and more valuable to extract text from the image. The text image can be divided into document text image, caption text image, and scene text image due to the different characteristics of the text in the image; so many researchers have proposed different methods to extract text. This paper puts forward a new method which is based on differential operation and morphological for caption text extraction. First, using three differential operator of vertical, horizontal, and diagonal direction to detect caption text information, and then using morphological processing to further determine caption regions. Finally, Using mathematical logic “and” operation to process three pieces of different direction of caption region image, combined with recursive statistics method to delete noise and extract the final caption region. Results in this paper show that this method can effectively extract the caption information in the image. Keywords Text extraction processing



Edge detection



Morphology



Binary



Image

1 Introduction With the continuous progress and development of modern multimedia technology, image has become one of the important medium of mutual communication, and the texts in the image can illustrate the meaning of the image, therefore, text extraction from complex background image is of great significance [1–3]. The text image can be divided into document text image, caption text image, and scene text image according to the characteristics of the text in the image. There are many experts from home and abroad to study the text extraction technology; they use different L. Ji (✉) Electrical Engineering College in SuZhou Chien-Shiung Institute of Technology, Taicang, Jiangsu Province, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 S.K. Bhatia et al. (eds.), Advances in Computer and Computational Sciences, Advances in Intelligent Systems and Computing 554, https://doi.org/10.1007/978-981-10-3773-3_48

495

496

L. Ji

Fig. 1 The flowchart of algorithm for caption text extraction

processing methods for different text images. Alvaro et al. [4] used text recognition to determine the text area. This method included three steps: First, find out the character candidates by segmentation, then analyze further the connected component; finally, using gradient features of texts and the method of support vector machines to classify the text line. H. Chen et al. [5] presented a new method for text detection. First, using the maximum and Stable Extreme Regions as text candidates, then eliminate nontext regions by geometric and stroke width of candidates. Sumathi et al. [6] used gamma correction method (GCM) to extract text from image. This approach used texture analysis and measurement to evaluate the gamma value, which is applied to an original input image to obtain the background image. Keechul and Jung [7] puts forward a kind of method which uses the texture and connected component analysis to locate