Making Virtue of Necessity: A Verb Lexicon

We describe the verb lexicon of OpenWordNet-PT, a wordnet-like resource for (mostly Brazilian) Portuguese and a series of experiments that we designed to extend its coverage. These experiments include checking online lists of most common verbs, checking c

  • PDF / 164,714 Bytes
  • 12 Pages / 439.37 x 666.142 pts Page_size
  • 92 Downloads / 196 Views

DOWNLOAD

REPORT


3

Nuance Communications, Sunnyvale, USA [email protected] 2 IBM Research, Rio de Janeiro, Brazil IBM Research and FGV/EMAp, Rio de Janeiro, Brazil 4 IBM Research, S˜ ao Paulo, Brazil {fchalub,livym,alexrad}@br.ibm.com

Abstract. We describe the verb lexicon of OpenWordNet-PT, a wordnet-like resource for (mostly Brazilian) Portuguese and a series of experiments that we designed to extend its coverage. These experiments include checking online lists of most common verbs, checking corpora freely available such as the Bosque-UD (the Bosque corpus annotated with Universal Dependencies) and especially checking a dictionary of Brazilian politicians’ biographies (the DHBB) that we consider an ideal corpus for the kind of information extraction we are after. We certainly succeeded into extending the coverage of the verb lexicon, however it remains to be seen whether this new coverage is enough for the original application.

1

Introduction

Verbs, together with nouns, are usually the main bearers of meaning in sentences. We could not agree more with [8] when they say Verbs are the primary vehicle for describing events and expressing relations between entities. Hence, verb semantics could help in many natural language processing (NLP) tasks that deal with events or relations between entities. For tasks which require canonicalization of natural language statements or derivation of (plausible) inferences from such statements, a particularly valuable resource is one which (i) relates verbs to one another and (ii) provides broad coverage of the verbs in the target language. Portuguese is the 6th most spoken language in the world, according to Etnologue [11], but lexical resources for Portuguese are still not very well-developed. Despite some recent work on Portuguese verbs, such as VerbNet.BR [19,20], Viper [3], and the catalog of Brazilian Portuguese Verbs [6], there are still no freely available, comprehensive resources that provide human users and automated programs with access to Portuguese verbs, their meanings and information about their subcategorization frames. Given the essential role played by verbs in sentence understanding we decided to improve the state of the verb lexicon in the basic resource OpenWordNetPT [14]. OpenWordnet-PT already provides some of the functionality desired, c Springer International Publishing Switzerland 2016  J. Silva et al. (Eds.): PROPOR 2016, LNAI 9727, pp. 271–282, 2016. DOI: 10.1007/978-3-319-41552-9 28

272

V. de Paiva et al.

as it has 5902 verbal synsets in Portuguese, with as many as 4511 verbal lemmas. It also has 7865 synsets in English that are empty in Portuguese and for many of these we know there are Portuguese words that fit them perfectly, but they are not there, yet1 . An example is the verb popularize: the verb popularizar exists in Portuguese with the same sense as popularize has in English. We only need to add it to the appropriate synset, but our problem is to find out within the ‘soup’ of these 7865 empty synsets, which ones are easy cases, where a correspond