Constituency Parsing of Complex Noun Sequences in Hindi

A complex noun sequence is one in which a head noun is recursively modified by one or more bare nouns and/or genitives Constituency analysis of complex noun sequence is a prerequisite for finding dependency relation (semantic relation) between components

  • PDF / 200,520 Bytes
  • 12 Pages / 439.363 x 666.131 pts Page_size
  • 14 Downloads / 223 Views

DOWNLOAD

REPORT


Language Technologies Research Centre, International Institute of Information Technology, Hyderabad, India [email protected], [email protected] 2 Department of Sanskrit Studies, University of Hyderabad, India [email protected]

Abstract. A complex noun sequence is one in which a head noun is recursively modified by one or more bare nouns and/or genitives1 Constituency analysis of complex noun sequence is a prerequisite for finding dependency relation (semantic relation) between components of the sequence. Identification of dependency relation is useful for various applications such as question answering, information extraction, textual entailment, paraphrasing. In Hindi, syntactic agreement rules can handle to a large extent the parsing of recursive genitives (Sharma, 2012)[12].This paper implements frequency based corpus driven approaches for parsing recursive genitive structures that syntactic rules cannot handle as well as recursive compound nouns and combination of gentive and compound noun sequences. Using syntactic rules and dependency global algorithm, an accuracy of 92.85% is obtained. Keywords: constituency parsing, bracketing, complex noun sequence, compound noun, genitives.

1

Introduction

A noun can have various pre-modifiers such as adjective, adjectival phrase, bare noun (henceforth, compound noun), genitive noun. The case becomes complex when a head noun2 is modified recursively as in 1. ( ladake kA ( mittI kA ghar ) ) “boy” genitive-marker “mud” genitive-marker “house” 2. ( jilA ( nirvAchan adhikArI ) ) “district” “election” “officer” Or a head noun is modified by a complex modifier. Example: 3. ( ( AdamI “man” 1 2

ke bete ) kA ghar ) genitive-marker “son” genitive-marker “house”

Genitive markers in Hindi are kA, and its allomorphic variations ke and kI. Hindi is a head final language.

A. Gelbukh (Ed.): CICLing 2014, Part I, LNCS 8403, pp. 285–296, 2014. c Springer-Verlag Berlin Heidelberg 2014 

286

A. Batra, S. Paul, and A. Kulkarni

4. ( ( krishi “agriculture”

prasanskaraNa ) udyog ) “processing” “industry”

Complex noun sequence is a sequence having multiple nouns. Nouns may or may not be separated by genitive markers. When no genitive marker is present in between the nouns, then such sequence is known as compound noun. A noun sequence can be represented as the following regular expression[1]: (noun+

3

genitive-marker )*

4

noun+

Binary constituency parsing of noun with complex modifier is an important requirement for determining the semantic relation between noun and its modifier. (Sharma, 2012)[12] uses agreement rules for parsing nouns having recursive genitive modifiers and reports an accuracy of 80%. Syntactic agreement rules alone fail to determine the constituents when the allomorphic forms of genitive are same as in: 5. ( ladake kA ( patthar kA ghar ) ) “boy” genitive-marker “stone” genitive-marker “house” 6. ( ( vimAnoM kI kharId ) kI yojanA ) “aircraft” genitive-marker “purchases” genitive-marker “plan” Syntactic rules also fail for bare noun sequences. For handling