Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

PDF / 2,497,733 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
35 Downloads / 223 Views

Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network Qi Cheng 1 & Guodong Wang 1 & Qian Dong 2 & Bin Wei 3 Received: 27 December 2019 / Revised: 22 July 2020 / Accepted: 28 July 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Recently, scene text detection has become an active research field, which is an essential component of scene text reading. Especially, segmentation-based methods are commonly used, since the segmentation results can describe text of arbitrary shape. However, curve texts have a diversity of shapes, scales and orientations, which are difficult to locate, so the detector requires to adjust the local receptive fields size adaptively, which can aggregate multi-scale spatial information to accurately locate the curve text instance. Moreover, the low-level features are critical for localizing large text instances. When using Feature Pyramid Network (FPN) for multi-scale feature fusion, it will prevent the flow of accurate localization signals due to the long path from low-level to top-level. In order to solve these two problems, this paper proposes an Adaptive Convolution and Path Enhancement Pyramid Network (ACPEPNet), which can more accurately locate the text instances with arbitrary shapes. Firstly, an Adaptive Convolution Unit is introduced to improve the ability of backbone to aggregate multi-scale spatial information at the same stage. Specially, this unit is a lightweight component and without the cost of computations, based on this component we present a backbone network for text features extraction. Secondly, the original FPN structure is redesigned to build a short path from the low-level to top-level, in this way, we modify the path from one-way flow to two-way flow and add original features to the final stage of information fusion. Experiments on CTW1500, Total-Text, ICDAR 2015 and MSRA-TD500 validate the robustness of the proposed method. When there is no bells and whistles, this method achieves an Fmeasure of 80.8% without external training data on CTW1500. Keywords Arbitrary shapes text detection . Adaptive convolution . Backbone network . Feature pyramid network . Multi-scale

* Guodong Wang [email protected] Extended author information available on the last page of the article

Multimedia Tools and Applications

1 Introduction In recent years, scene text detection as a fundamental computer vision task has become an active research field, since it is an essential step in many applications such as automatic driving, scene understanding and text recognition. With the rapid development of Convolutional Neural Networks [7, 9, 13, 17, 46, 47], many progresses have been made [19, 20, 39, 43]. Scene text detection methods can be roughly formulated as two categories: regression-based methods and segmentation-based methods, especially segmentation-based methods have received much attention, since the segmentation results can describe text of arbitrary shape such as curve text. Some new approaches [19, 24, 27, 45]

Data Loading...

Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

Recommend Documents

Path Aggregation and Dual Supervision Network for Scene Text Detection

Stance Detection with Stance-Wise Convolution Network

Scene Text Detection with Adaptive Line Clustering

Towards Causal Explanation Detection with Pyramid Salient-Aware Network

Adaptive Convolution Kernel for Text Classification via Multi-channel Representations

Adaptive Feature Enhancement Network for Semantic Segmentation

Efficient Segmentation Pyramid Network

Multi-level Temporal Pyramid Network for Action Detection

Adaptive Attributed Network Embedding for Community Detection

Detection of Spammers Using Modified Diffusion Convolution Neural Network

Research on Vehicle Detection Based on Visual Convolution Network Optimization

Split and Merge: Component Based Segmentation Network for Text Detection