HINDIA: a deep-learning-based model for spell-checking of Hindi language

PDF / 2,178,147 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
33 Downloads / 216 Views

ORIGINAL ARTICLE

HINDIA: a deep‑learning‑based model for spell‑checking of Hindi language Shashank Singh1 · Shailendra Singh1 Received: 7 November 2019 / Accepted: 13 July 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attentionbased encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spellchecker and some other deep-learning-based correction models present in the literature. Keywords Spelling · Spell-checker · Deep-learning · Long short-term memory · Encoder–decoder recurrent neural network

1 Introduction Artificial intelligence (AI) has become the most sought-out field of research in this information age. AI is using deeplearning (DL) techniques to ultimately devise some systems to assist the human. DL is the method that uses past experience to teach the machine to answer the particular question [1]. Nowadays, DL methods are being used rigorously by artificial intelligence and Natural language processing (NLP) * Shashank Singh [email protected] Shailendra Singh [email protected] 1

Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Chandigarh, India

researchers [2]. NLP

Data Loading...

HINDIA: a deep-learning-based model for spell-checking of Hindi language

Recommend Documents

Empirical Laws of Natural Language Processing for Hindi Language

A deep neural network-based model for named entity recognition for Hindi language

Experimenting with factored language model and generalized back-off for Hindi

Prediction of POS Tagging for Unknown Words for Specific Hindi and Marathi Language

A Subword Level Language Model for Bangla Language

A Neural Framework for English-Hindi Cross-Lingual Natural Language Inference

Newspaper Identification in Hindi

CNN Model for American Sign Language Recognition

Cross-Lingual Transfer for Hindi Discourse Relation Identification

Hinduism and Hindi Theater

Polar question particles: Hindi-Urdu kya:

A model of a generic Arabic language interface for multimodel database