Text Extraction from Images: A Review

Multimedia, natural scenes, images are sources of textual information. Textual information extracted from these sources can be used for automatic image and video indexing, and image structuring. But, due to variations in text style, size, alignment of tex

  • PDF / 347,324 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 36 Downloads / 249 Views

DOWNLOAD

REPORT


Abstract Multimedia, natural scenes, images are sources of textual information. Textual information extracted from these sources can be used for automatic image and video indexing, and image structuring. But, due to variations in text style, size, alignment of text, as well as orientation of text and low contrast of the image and complex background make challenging the extraction of text. From the past recent years, many methods for extraction of text are proposed. This paper provides with analysis, comparison of performance of various methods used for extraction of text information from images. It summarizes various methods for text extraction and various factors affecting the performance of these methods.



Keywords Text extraction Text localization Connected component Edge-based approach





Text segmentation

1 Introduction Text is the source of information that can be embedded into documents or in images. Computers and humans can easily understand text-based information. In this digital era, images have become a good means for communication. Today people use to send images with text embedded in it which they want to send along with the image. This is very trendy in social medias like Facebook, whatsapp, twitter. Textual images prove good in describing your emotion rather than sending only text messages on social media. Relevant information can be extracted from WWW images on the Internet which can be used in making web search efficient. Text in images is helpful in many content-based image applications such as image searching on web, mobile-based text analysis, video indexing, and human and computer interaction systems.

N. Sharma (✉) ⋅ Nidhi Department of IT, UIET, PU, Chandigarh, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 D.K. Mishra et al. (eds.), Information and Communication Technology for Sustainable Development, Lecture Notes in Networks and Systems 10, https://doi.org/10.1007/978-981-10-3920-1_16

153

154

N. Sharma and Nidhi

This paper is arranged as follow: Steps involve in extraction of text is described in Sect. 2. Section 3 analyzed various real-time application of text extraction from images. Section 4 describes different challenges involved in extraction of text in images. Section 5 explains briefly the various methodologies used in extraction of text. Finally summary and scope of text extraction in future is portrayed in Sect. 6.

2 Steps of Text Extraction The text extraction problem is divided into following steps [1]: (i) (ii) (iii) (iv) (v)

2.1

Text Text Text Text Text

detection, localization, tracking, extraction and enhancement, and recognition.

Text Detection

As there is no prior information that the images contain text or not so, text detection step determines whether text is present in given image or not. It can be done by making use of pixel intensity. It is assumed that text has higher pixel intensity than background pixels so, pixels with value less than predefined threshold value and having significant color difference from neighborin