Comparison of Different Similarity Functions on Hindi QA System

This paper discusses a comparative analysis of different similarity measures for Hindi question answering system using machine learning approach from information retrieval and classification perspectives. Many machine learning tasks require similarity fun

PDF / 591,900 Bytes
7 Pages / 439.37 x 666.142 pts Page_size
54 Downloads / 169 Views

DOWNLOAD

REPORT

Abstract This paper discusses a comparative analysis of different similarity measures for Hindi question answering system using machine learning approach from information retrieval and classiﬁcation perspectives. Many machine learning tasks require similarity functions that evaluate likeness between examinations. Similarity computations are particularly important for clustering that depends on precise estimate of the distance between data points. This framework is considered for data matching for multiphrase words and misspelled words.

Keywords Hindi question answering system Machine learning Data mining Similarity functions Text similarity measure N-gram approach Jaccard coefﬁcient similarity Euclidean similarity measure Jaro–Wrinkler

1 Introduction A question answering system includes a process of data matching that aims to interpret whether two data occurrences represent the same entity. This approximate data matching process is relying on similarity functions [1]. Similarity measures have become an extremely popular tool in machine learning. One of the problems that occur in QA system using machine learning is data mining. Data is an essential entity or fact of our concern, but we should know how to retrieve or extract useful

B. Sneha (&) Department of Computer Science and Engineering, Banasthali Vidyapith, Banasthali, India e-mail: [email protected] D. Mohit V. Zorawar Singh Department of Computer Engineering, National Institute of Technology, Kurukshetra, India e-mail: [email protected] V. Zorawar Singh e-mail: [email protected] © Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.), Proceedings of International Conference on ICT for Sustainable Development, Advances in Intelligent Systems and Computing 408, DOI 10.1007/978-981-10-0129-1_68

657

658

B. Sneha et al.

entity from the large volumes of raw data. Data mining techniques help us in accomplishing this [1]. Data mining depends upon distance estimate between observations. The concept of similarity can be different depending on particular domain, task, or dataset available. It is desirable to learn similarity functions from training data to seize the correct notion of distance for a particular task available in a given domain. Another key application that can be beneﬁt from using learnable similarity functions is clustering [2].

2 Different Similarity Functions A text document can be modeled in many ways, “bag-of-words” being the most prominent representation [3] in IR and data mining. A phrase count is maintained in a bag and each word is made to correspond to an aspect in the followed data space. Consequently, the word appearing in the document with a high frequency, contributes a high weight. This weight can be raised if stemming is applied as N-variants of a base word add up. Accurate clustering requires an error-free deﬁnition of the closeness between a pair of topic, concerning of either the pairwise comparison. In our work, ﬁrst we use N-gram approach on dataset. In [4], A.K. Patid

Data Loading...

Comparison of Different Similarity Functions on Hindi QA System

Recommend Documents

Web QA

Seismic Behaviour and Comparison of Different Slab System Diagrid Structure

Similarity-Based Correlation Functions for Binary Data

Newspaper Identification in Hindi

DSQA: A Domain Specific QA System for Smart Health Based on Knowledge Graph

An overview of distance and similarity functions for structured data

A Model-Based Approach to Constructing Music Similarity Functions

Image Context Based Similarity Retrieval System

Performance of Speaker Recognition System Using Kernel Functions Approach for Different Noise Levels

Hinduism and Hindi Theater

Graphical Representation and Similarity Analysis of DNA Sequences Based on Trigonometric Functions

Comparison of different guidelines for oral cancer