Line Segmentation of Devanagari Ancient Manuscripts
- PDF / 4,605,524 Bytes
- 8 Pages / 595.276 x 790.866 pts Page_size
- 62 Downloads / 211 Views
RESEARCH ARTICLE
Line Segmentation of Devanagari Ancient Manuscripts Sonika Rani Narang1 • M. K. Jindal2 • Munish Kumar3
Received: 27 January 2018 / Revised: 5 June 2019 / Accepted: 6 June 2019 Ó The National Academy of Sciences, India 2019
Abstract In this paper, we have proposed a line segmentation algorithm for Devanagari ancient documents by dividing the document image into vertical stripes and followed by piecewise horizontal projection profiles. Average line height is computed, and based on this height, undersegmentation or over-segmentation of lines is detected for ancient manuscripts recognition. For experimental work of line segmentation algorithm, Devanagari ancient manuscripts are collected from various libraries and museums. To digitize these images, scanner and digital camera are used. The proposed line segmentation algorithm is tested on 1500 text lines. To the best of our knowledge, this is the first work of its kind for line segmentation on Devanagari ancient manuscripts. The proposed algorithm can be used for other Indian scripts also. Keywords OCR Segmentation Line segmentation Ancient manuscripts Ancient documents Historical document
& Munish Kumar [email protected] Sonika Rani Narang [email protected] M. K. Jindal [email protected] 1
Department of Computer Science, D. A. V College, Abohar, Punjab, India
2
Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, Punjab, India
3
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India
1 Introduction In this digital day and age, it has turned out to be mandatory to have all the accessible data in a digital form recognized by machines. In the digitization phase, optical character recognition (OCR) intends to recognize characters from these digitized documents. In India, there are a large number of historical ancient documents in handwritten Devanagari script containing ancient manuscripts written by spiritual persons of fifteenth to twentieth centuries. These manuscripts have great information. But, due to their delicate condition, these ancient documents of thousand pages are not easily accessible. It has to be digitized and converted to a textual form in order to be recognized by machines doing searches of millions of pages/ second. OCR plays an important role in recognizing text from images of documents. Many commercial OCRs are available for a number of non-Indian scripts like Arabic, Italian, Roman, Spanish, etc. OCR for Indian scripts is at the advanced research level. OCR for Devanagari printed text has been developed up to acceptable accuracy, but OCR for handwritten Devanagari documents especially ancient documents is still at a very early stage. Not much work has been done in this field. In this paper, authors have presented a line segmentation algorithm for Devanagari ancient documents.
2 Related Work Many techniques have been proposed in the literature for line segmentation of a document image. Survey
Data Loading...