Scene text detection using enhanced Extremal region and convolutional neural network

  • PDF / 1,932,699 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 49 Downloads / 183 Views

DOWNLOAD

REPORT


Scene text detection using enhanced Extremal region and convolutional neural network Fatemeh Naiemi 1 & Vahid Ghods 1

& Hassan Khalesi

2

Received: 10 September 2019 / Revised: 26 May 2020 / Accepted: 9 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Text in scene images usually contains significant information. Text detection and recognition in the scene is important for a variety of advanced machine vision applications, such as image and video retrieval, automotive assistance, and multilingual translation. In particular, most text recognition systems require texts to be localized in images beforehand and this is a significant demand. The purpose of this study is to provide a method to detect texts in natural images. The proposed approach combines advantages of extremal region, ER, methods and classification of convolutional neural network, CNN. This significantly reduces the false positives and increases the accuracy of detection. The method of sliding windows is employed with different sizes in order to determine text candidates. Extraction of enhanced ERs is performed in three consecutive stages on three distinct color channels, R, G, and B. Then, the results are combined together by an add method. After grouping, the word candidates are classified to two classes of text and nontext sections by a CNN classifier. By applying non-maximum suppression (NMS) algorithm to the same words, words with the highest probability are selected. The average values of accuracy, recall, precision and F-measure of the proposed text detection model on the ICDAR2013 database are 0.893, 0.962, 0.948, and 0.955, respectively. The optimal cut point of the proposed method is 0.648, which has the highest average accuracy, 91.93%. The AUC of ROC and PR diagrams for the proposed model are 0.851 and 0.718, respectively. These results of AUC for ROC and PR curves showed an outstanding enhancement in comparison with the best detection rate of previous methods. Experimental results on the ICDAR2011, ICDAR2013 and ICDAR2015 databases also demonstrate that our algorithm outperforms the state-of-the-art scene text detection methods. Keywords Scene text detection . Extremal region . ER method . CNN . Natural image

* Vahid Ghods [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction In recent years, recognition and detection of texts in scene images with various types of challenges has been widely studied. There are rotation and orientation, changing the scale and variety of font texts, image noise, and wild background images which makes extraction and recognition of the text from complex image more difficult [21]. In general, detection and recognition of texts are divided into three categories: text detection, text recognition, and endto-end text recognition [48]. The method for text detection includes ways to find the regions which are likely texts present in the image. In the method of text recognition, the pro