Ancient text recognition: a review

PDF / 990,197 Bytes
42 Pages / 439.37 x 666.142 pts Page_size
82 Downloads / 398 Views

Ancient text recognition: a review Sonika Rani Narang1 · M. K. Jindal2 · Munish Kumar3

© Springer Nature B.V. 2020

Abstract Optical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, there is a wealth of knowledge in the form of ancient text documents. It is a challenge to maintain and search from this paper-based data. At many places, efforts are being done to digitize this data. Paper based documents are scanned to digitize data but scanned data is in pictorial form. It cannot be recognized by computers because computers can understand standard alphanumeric characters as ASCII or some other codes. Therefore, alphanumeric information must be retrieved from scanned images. Optical character recognition system allows us to convert a document into electronic text, which can be used for edit, search, etc. operations. OCR system is the machine replication of human reading and has been the subject of intensive research for more than six decades. This paper presents a comprehensive survey of the work done in the various phases of an OCR with special focus on the OCR for ancient text documents. This paper will help the novice researchers by providing a comprehensive study of the various phases, namely, segmentation, feature extraction and classification techniques required for an OCR system especially for ancient documents. It has been observed that there is a limited work is done for the recognition of ancient documents especially for Devanagari script. This article also presents future directions for the upcoming researchers in the field of ancient text recognition. Keywords OCR · Feature extraction · Classification · Devanagari · Ancient

* Munish Kumar [email protected] 1

Department of Computer Science, D.A.V. College, Abohar, Punjab, India

2

Department of Computer Science and Applications, Panjab University Regional Centre, Muktsar, Punjab, India

3

Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India

13

Vol.:(0123456789)

S. R. Narang et al.

1 Introduction In this digital day and age, it has become obligatory to have all the available information in a digital form recognized by machines. In the digitization phase of any document, the printed or handwritten text is converted into digital form either by scanning the given document or by using some digital camera or writing with a digitizer connected with a LCD. Character recognition is a process to identify characters from these sources. There are many issues which make the development of an optical character recognition (OCR) system very complex. Some of these issues are discussed below: • Unique writing styles make development of an OCR system for handwritten documents very difficult. • Noisy and degraded documents make pre-processing a complex task. • Touching and overlapping characters make segmentatio

Data Loading...

Ancient text recognition: a review

Recommend Documents

Review on Text Recognition in Natural Scene Images

A Theoretical Framework for Stigmergetic Reconstruction of Ancient Text

Text Segmentation for Document Recognition

Text Extraction from Images: A Review

Adaptive Text Recognition Through Visual Matching

An Embedded Application for Degraded Text Recognition

Scene Text Recognition Based on Deep Learning

Text Mining and Analysis of Meituan User Review Text

Recognition of Cursive Caption Text Using Deep Learning - A Comparative Study on Recognition Units

Service Quality Evaluation Using Text Mining: A Systematic Literature Review

Critical Care Study Guide Text and Review

Exploring Ancient Skies A Survey of Ancient and Cultural Astronomy