Contrast Based Color Plane Selection for Binarization of Historical Document Images

This paper primarily focuses on establishing that the document image processing and natural image processing domains are mutually dependent, which many researchers have not experimented. For a given color image, the contrast-per-pixel (CPP) for each color

  • PDF / 285,572 Bytes
  • 7 Pages / 439.37 x 666.142 pts Page_size
  • 49 Downloads / 175 Views

DOWNLOAD

REPORT


Abstract This paper primarily focuses on establishing that the document image processing and natural image processing domains are mutually dependent, which many researchers have not experimented. For a given color image, the contrast-per-pixel (CPP) for each color channel is computed and the channel that exhibits highest CPP value is binarized. To evaluate the proposed method, the color image is also converted to a grayscale image using a weighted color-to-grayscale conversion and then binarized. Otsu Algorithm is preferred for binarization. Images from DIBCO and H-DIBCO datasets were used for evaluating the proposed algorithm. The resultant binary images were appraised based on precision metrics which shows that the highest CPP exhibits better performance. Experimentally, the extracted color channels performed marginally better than the weighted color-to-grayscale converted image, which clearly indicates that image binarization depends on natural image processing.



Keywords Document image processing Natural image processing binarization Contrast measure Binarization metrics Precision









Otsu

1 Introduction Historical Document Image processing is one of the most challenging areas of image processing till date. The reasons behind this challenge can be listed as follows: severe degradation of the document, unstructured character positioning, variations in the same document, variation of appearance and bleed-through in some document with same and different color inks [1, 2]. M.E. Paramasivam (✉) ⋅ R.S. Sabeenian Department of Electronics and Communication Engineering, Sona College of Technology, Sona Nagar, T.P.T. Road, Salem 636005, India e-mail: [email protected] R.S. Sabeenian e-mail: [email protected] © Springer Science+Business Media Singapore 2017 K.R. Attele et al. (eds.), Emerging Trends in Electrical, Communications and Information Technologies, Lecture Notes in Electrical Engineering 394, DOI 10.1007/978-981-10-1540-3_26

249

250

M.E. Paramasivam and R.S. Sabeenian

Since 2009, the DIBCO and H-DIBCO [3] contests have encouraged a number of researchers to contribute in the domain. As a first step in document image processing, the image is binarized to get a clear classification of text and background. This would enable quicker segmentation of text from the images. We have considered Otsu [4] method for binarization, due to its capability that almost after two decades, [5] have analyzed properties for the algorithm and indicated it to be the most excellent global binarization method. Though, the problem is primarily to split the image into two broad sets, one indicating the text and the other the background; it is significant to note that images vary in the aspect of color, contrast, brightness, etc. Hence, it is worth that the image is analyzed for its variation in the above parameters based on Natural Image Processing and then subjected to the best performing binarization algorithm.

2 Algebra of Image Processing Let us consider, a value set φw , which shall represents all the possible valu