A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

PDF / 15,231,969 Bytes
22 Pages / 595.224 x 790.955 pts Page_size
109 Downloads / 263 Views

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images Aparna Yegnaraman1

· S. Valli1

Accepted: 23 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Text helps to convey the intended message to users very accurately. Detecting text from natural scene images for quadrilateral-type and polygon-type datasets is the primary scope of this work. A regression-based method using modified You Only Look Once YOLOv4 network is used for quadrilateral-type datasets. Hyperparameters for training the network are optimized using the Genetic Algorithm which proves to be a suitable candidate than traditional methods. The PixelsIoU (PIoU) loss is introduced to derive an accurate bounding box and it seems to be productive under various challenging scenarios with high aspect ratios and complex background. This yielded quick results for quadrilateral-type datasets but did not scale for arbitrarily-shaped and curved scene text. So the approach is changed to segmentation based for enhancing the results. This introduces binarization operation in a segmentation network to boost its detection accuracy for polygon-type datasets. The introduction of a new module DiffBiSeg (Differentiable Binarization in Segmentation network) facilitates postprocessing and text detection performance by setting the thresholds flexibly for binarization in the segmentation network. The efficacy of both approaches is clearly seen in their respective experimental results. Keywords Scene text detection · PIoU loss · Genetic algorithm · You only look once · Differentiable binarization · Flexible threshold

1 Introduction Text characters embedded in natural scene images and video sequences epitomize a rich source of information for applications such as mobile visual searches, content-based image retrieval, automatic sign translation and image based geolocation. Due to the unrestrained scene environment with various text sizes, colors, complex, and cluttered backgrounds, uncontrollable lighting conditions, etc., scene text detection is still a big challenge. The classic text detection methods involve a lot of modules working together with multiple processing steps. Heuristic rules need to be framed and parameters are to

Aparna Yegnaraman

[email protected] S. Valli [email protected] 1

Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai 600025, India

be defined and tuned. Henceforth the detection speed is considerably reduced and it is also very difficult to achieve good results. After the inception of many object detection methods based on convolutional neural network (CNN) [32, 51, 53, 54], there is a significant impact in scene text detection which clearly showed remarkable improvement in accuracy and speedy detection. R-CNN [12], proposed by Girshick et al. in 2014, employed CNN to extract features for the first time in the object-detection framework and got commendable results than state-of-the-art approach

Data Loading...

A comparative approach on detecting multi-lingual and multi-oriented text in natural scene images

Recommend Documents

Review on Text Recognition in Natural Scene Images

Anchor-free multi-orientation text detection in natural scene images

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

Detecting Text in Natural Image with Connectionist Text Proposal Network

Automated Text Detection and Text-Line Construction in Natural Images

A New Method for Detecting Altered Text in Document Images

Automatic Text Localization in Scene Images: A Transfer Learning Based Approach

Dynamic Lexicon Generation for Natural Scene Images

Multilingual Information Access for Text, Speech and Images 5th Work

Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling

Language identification from multi-lingual scene text images: a CNN based classifier ensemble approach

Natural Scene Mongolian Text Detection Based on Convolutional Neural Network and MSER