Word Sense Disambiguation Using IndoWordNet

Word sense disambiguation (WSD) is considered as one of the toughest problems in the field of natural language processing. IndoWordNet is a linked structure of WordNets of major Indian languages. Recently, several IndoWordNet-based WSD approaches have bee

  • PDF / 529,528 Bytes
  • 18 Pages / 439.37 x 666.142 pts Page_size
  • 58 Downloads / 193 Views

DOWNLOAD

REPORT


Word Sense Disambiguation Using IndoWordNet Sudha Bhingardive and Pushpak Bhattacharyya

Abstract Word sense disambiguation (WSD) is considered as one of the toughest problems in the field of natural language processing. IndoWordNet is a linked structure of WordNets of major Indian languages. Recently, several IndoWordNet-based WSD approaches have been proposed and implemented for Indian languages. In this chapter, we present the usage of various other features of IndoWordNet in performing WSD. Here, we have used features such as linked WordNets and lexico-semantic relations. We have followed two unsupervised approaches, viz. (1) use of IndoWordNet in bilingual WSD for finding the sense distribution with the help of expectation maximization algorithm and (2) use of IndoWordNet in WSD for finding the most frequent sense using word and sense embeddings. Both these approaches justify the importance of IndoWordNet for word sense disambiguation for Indian languages, as the results are found to be promising and can beat the baselines. Keywords IndoWordNet · WordNet ·  Word sense disambiguation  · WSD · Bilingual WSD  ·  Unsupervised WSD  ·  Most frequent sense  · MFS

S. Bhingardive (*) · P. Bhattacharyya  Department of Computer Science and Engineering, Indian Institute of Technology-Bombay, Powai, Mumbai, India e-mail: [email protected] P. Bhattacharyya e-mail: [email protected] © Springer Science+Business Media Singapore 2017 N.S. Dashet al. (eds.), The WordNet in Indian Languages, DOI 10.1007/978-981-10-1909-8_15

243

244

S. Bhingardive and P. Bhattacharyya

15.1 Introduction 15.1.1 What is Word Sense Disambiguation? Word sense disambiguation (WSD) is the task of identifying the correct meaning of a word in a given context. The necessary condition for a word to be disambiguated is that it should have multiple senses. Generally, in order to disambiguate a given word, we should have a context in which the word has been used and knowledge about the word; otherwise, it becomes difficult to get the exact meaning of a word. Also, if the concept of a sense is not well defined, then it becomes very elusive task for WSD. The senses of a word differ from dictionary to dictionary. Some of them are coarse, while others provide a fine-grained distinction between possible senses. This may be the reason why there does not exist any WSD classifier which can give an accuracy of 100 %, not even human experts can agree on the sense of some words during manual disambiguation tasks. The following is the example in Hindi which explains the WSD. S1:  raam ne bagiiche ke paudhon ko kaataa   (Ram cuts plants of the garden) kutte ne billi ko kaataa (dog bites a cat) S2:  Here, the word kaataa has two different senses. In sentence S1, the correct sense of kaataa is ‘to cut’ as it appears with the context words baagiichaa (garden) and paudhaa (plant). However, in sentence S2, the correct sense of kaataa is ‘to bite’ as it appears with the context word kutta (dog).

15.1.2 Variants of Word Sense Disambiguation The word sense disambiguat