DetectGAN: GAN-based text detector for camera-captured document images
- PDF / 3,908,668 Bytes
- 11 Pages / 595.276 x 790.866 pts Page_size
- 94 Downloads / 162 Views
ORIGINAL PAPER
DetectGAN: GAN-based text detector for camera-captured document images Jinyuan Zhao1,2 · Yanna Wang1 · Baihua Xiao1 · Cunzhao Shi1 · Fuxi Jia1 · Chunheng Wang1 Received: 1 December 2019 / Revised: 10 June 2020 / Accepted: 23 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Nowadays, with the development of electronic devices, more and more attention has been paid to camera-based text processing. Different from scene image, the recognition system of document image needs to sort out the recognition results and store them in the structured document for the subsequent data processing. However, in document images, the fusion of text lines largely depends on their semantic information rather than just the distance between the characters, which causes the problem of learning confusion in training. At the same time, for multi-directional printed characters in document images, it is necessary to use additional directional information to guide subsequent recognition tasks. In order to avoid learning confusion and get recognition-friendly detection results, we propose a character-level text detection framework, DetectGAN, based on the conditional generative adversarial networks (abbreviation cGAN used in the text). In the proposed framework, position regression and NMS process are removed, and the problem of text detection is directly transformed into an image-to-image generation problem. Experimental results show that our method has an excellent effect on text detection of camera-captured document images and outperforms the classical and state-of-the-art algorithms. Keywords Text detection · Camera-captured document images · Multi-scale context features · Generative adversarial networks
1 Introduction With the rapid development of imaging technology, digital cameras and other imaging devices are becoming more and more popular [1,2], so the number of camera-captured document images increases at an explosive speed. The increasing demand for digital storage and processing of these document images is inevitable. The development of deep learning technology has introduced the general object detection and semantic segmentation model into the scene text detection task and has achieved good results. These scene text detection methods usually get text-line-level detection results, and the details of text location are handled more roughly. However, in daily use, in
B
Jinyuan Zhao [email protected]
1
Institute of Automation, Chinese Academy of Sciences (CASIA), 95 Zhongguancun East Road, Beijing 100190, PR China
2
University of Chinese Academy of Sciences (UCAS), No. 19 (A) Yuquan Road, Shijingshan District, Beijing 100049, PR China
addition to the problem of image degradation caused by document printing and image acquisition, for structured document images, it is also important to get structured recognition results with semantic information according to the content of the image, so as to facilitate the subsequent data storage and information processing. This makes it imposs
Data Loading...