A novel approach for automatic Bengali question answering system using semantic similarity analysis
- PDF / 2,088,603 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 12 Downloads / 209 Views
A novel approach for automatic Bengali question answering system using semantic similarity analysis A. Das1 · J. Mandal1 · Z. Danial1 · A. Pal2 · D. Saha1 Received: 10 December 2019 / Accepted: 28 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Finding the semantically accurate answer is one of the key challenges in advanced searching. In contrast to keyword-based searching, the meaning of a question or query is important here and answers are ranked according to relevance. It is very natural that there is almost no common word between the question sentence and the answer sentence. In this paper, an approach is described to find out the semantically relevant answers in the Bengali dataset. In the first part of the algorithm, a set of statistical parameters like frequency, index, part-of-speech (POS) is matched between a question and the probable answers. In the second phase, entropy and similarity are calculated in different modules. Finally, a sense score is generated to rank the answers. The algorithm is tested on a repository containing a total of 275,000 sentences. This Bengali repository is a product of Technology Development for Indian Languages (TDIL) project sponsored by Govt. of India and provided by the Language Research Unit of Indian Statistical Institute, Kolkata. The shallow parser, developed by the LTRC group of IIIT Hyderabad is used for POS tagging. The actual answer is ranked as 1st in 82.3% cases. The actual answer is ranked within 1st to 5th in 90.0% cases. The accuracy of the system is coming as 97.32% and precision of the system is coming as 98.14% using confusion matrix. The challenges and pitfalls of the work are reported at last in this paper. Keywords Semantic search · Automatic question answering in Bengali · Semantic similarity · NLP
1 Introduction In keyword-based searching algorithms, the main focus is made on the number of words matched between the query and the result. Next, the results are filtered and ranked based on a few parameters like location, user cache, preference, etc. In contrast, a semantic searching technique first processes the query and understands the meaning of the question. Then relevant answers are retrieved based on context matching, sense matching, etc. It is very much possible that there is hardly any common word in between the query and the answer. Let us consider an example:
* A. Das [email protected] A. Pal [email protected] 1
Deparment of CSE, Faculty of Engineering and Technology, Jadavpur University, Kolkata 700032, India
College of Engineering and Management, Purba Medinipur, Kolaghat, West Bengal, India
2
“রামের মা এর নাম কি?” (What is the name of the mother of Rām?). In keywords based searching, the content words like “রাম” (Rām)/Ram, “মা” (mā)/Mother, “নাম” (nām)/Name are searched in the sentences present in the repository and the answer is retrieved only if these keywords are available in a sentence. But, if there is a sentence in the repository as: ক�ৌশল্যার এক মাত্র সন্তান রাম । (Kaush
Data Loading...