Ancient text recognition: a review
- PDF / 990,197 Bytes
- 42 Pages / 439.37 x 666.142 pts Page_size
- 82 Downloads / 263 Views
Ancient text recognition: a review Sonika Rani Narang1 · M. K. Jindal2 · Munish Kumar3
© Springer Nature B.V. 2020
Abstract Optical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, there is a wealth of knowledge in the form of ancient text documents. It is a challenge to maintain and search from this paper-based data. At many places, efforts are being done to digitize this data. Paper based documents are scanned to digitize data but scanned data is in pictorial form. It cannot be recognized by computers because computers can understand standard alphanumeric characters as ASCII or some other codes. Therefore, alphanumeric information must be retrieved from scanned images. Optical character recognition system allows us to convert a document into electronic text, which can be used for edit, search, etc. operations. OCR system is the machine replication of human reading and has been the subject of intensive research for more than six decades. This paper presents a comprehensive survey of the work done in the various phases of an OCR with special focus on the OCR for ancient text documents. This paper will help the novice researchers by providing a comprehensive study of the various phases, namely, segmentation, feature extraction and classification techniques required for an OCR system especially for ancient documents. It has been observed that there is a limited work is done for the recognition of ancient documents especially for Devanagari script. This article also presents future directions for the upcoming researchers in the field of ancient text recognition. Keywords OCR · Feature extraction · Classification · Devanagari · Ancient
* Munish Kumar [email protected] 1
Department of Computer Science, D.A.V. College, Abohar, Punjab, India
2
Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, Punjab, India
3
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India
13
Vol.:(0123456789)
S. R. Narang et al.
1 Introduction In this digital day and age, it has become obligatory to have all the available information in a digital form recognized by machines. In the digitization phase of any document, the printed or handwritten text is converted into digital form either by scanning the given document or by using some digital camera or writing with a digitizer connected with a LCD. Character recognition is a process to identify characters from these sources. There are many issues which make the development of an optical character recognition (OCR) system very complex. Some of these issues are discussed below: • Unique writing styles make development of an OCR system for handwritten documents very difficult. • Noisy and degraded documents make pre-processing a complex task. • Touching and overlapping characters make segmentatio
Data Loading...