Historical document image binarization via style augmentation and atrous convolutions

  • PDF / 2,479,358 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 28 Downloads / 222 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

S.I. : DICTA 2019

Historical document image binarization via style augmentation and atrous convolutions Hanif Rasyidi1,3



Salman Khan2,3

Received: 20 March 2020 / Accepted: 23 September 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Historical documents suffer from a variety of degradations, making it challenging to recover the original textual content. The image binarization problem seeks to separate the original textual content from the image degradations. In this paper, we present a new binarization technique to accurately learn original text patterns from a limited amount of available historical document data. Our approach consists of a cascade of style augmentation and image binarization networks. Our style augmentation network uses a random style transfer approach to improve the variety of training data by generating new style patterns for the existing documents. The binarization network employs an encoder-decoder-based text segmentation approach with atrous convolutions to preserve the spatial details. The resulting segmentations contain a considerably low noise level and smooth texture. Compared to other leading binarization methods available throughout the DIBCO competition, our proposed methods gain top performances across various evaluation measures. Keywords Text binarization  Text segmentation  Style transfer  Atrous convolution  Document analysis

1 Introduction Image binarization is one of the crucial steps for historical document analysis since it allows us to extract meaningful information from old documents with degraded quality. This process removes artifacts present in the original manuscript, to uncover the original writing for further processing steps, e.g., optical character recognition (OCR). Depending on when the record was written and the condition of preservation, the document can potentially face physical damage and corruption. As illustrated in Fig. 1, historical documents suffer from various forms of degradations such as stain, faded text, permeated ink, paper

& Hanif Rasyidi [email protected] Salman Khan [email protected] 1

Data61, Canberra, Australia

2

Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates

3

The Australian National University, Canberra, Australia

damage, character alteration due to stain or damage, or bad scan quality. Due to the diversity in document quality, the automatic binarization methods need to consider multiple aspects when processing the historical images to obtain the correct textual information. To achieve this goal, there are broadly two main approaches: noise removal-based and text localization-based binarization. The former approach utilizes image processing methods while the later one uses machine learning techniques to recover the document contents. The image processing approaches use various image enhancement methods such as thresholding [20, 22, 25], contrast analysis [1], multi-spectral imaging [11], automatic parameter tu