Validating a Tool for Evaluating Automatically Lexical Triples Mined from Texts

We report on an on-going effort to assess an automated ontology evaluation method that uses lexical frequencies to determine which are relevant lexical triples of a set of triples automatically mined from a textual corpus. The aim is to obtain a light-wei

  • PDF / 172,238 Bytes
  • 2 Pages / 430 x 660 pts Page_size
  • 87 Downloads / 199 Views

DOWNLOAD

REPORT


Abstract. We report on an on-going effort to assess an automated ontology evaluation method that uses lexical frequencies to determine which are relevant lexical triples of a set of triples automatically mined from a textual corpus. The aim is to obtain a light-weight automatic ontology evaluation method that can be easily applied by knowledge engineers to determine whether or not the most important notions and relationships are represented in a set of ontology triples.

1 Problem More and more ontologies are created automatically to avoid the knowledge acquisition bottleneck. The question then arises of how to enable non ontology engineering skilled individuals to estimate the “value” of ontological material generated automatically. It is our hypothesis that relatively crude evaluation methods suffice to give a general indication of how well the material fits the domain under scrutiny. But even the automated evaluation method needs to be assessed first by human experts and checked on its validity.

2 Methods Two texts (the EU directives on privacy and VAT) have been selected to be processed by an unsupervised text miner (a combination of a shallow parser and statistically based filters built by CNTS Antwerp). Series of triples (consisting of combinations of noun phrases, verbs and prepositional phrases) have been generated. Some evaluation metrics have been defined using a simple and basic method. First the relevant terms of a text (compared to a neutral corpus) are determined. This allows us to compare the set of relevant words with the vocabulary of the triples generated by the unsupervised miner. In addition, a list of VAT terms and expressions was available to validate the set of terms retrieved automatically. Unfortunately, such a list did not exist for the Privacy domain. Afterwards, the triples themselves are scored as well. A score indicates how many characters of the three triple parts (expressed as an averaged percentage) are matched by relevant words. The evaluation methods can be kept simple and straightforward: it is only meant as a check to determine whether or not the more sophisticated text miner has done a good job. Finally, to assess the quality of R. Meersman, Z. Tari, P. Herrero et al. (Eds.): OTM 2007 Workshops, Part I, LNCS 4805, pp. 11–12, 2007. © Springer-Verlag Berlin Heidelberg 2007

12

P. Spyns

the automatic evaluator, human experts have manually checked the outcomes (for the privacy directive). In the case of the VAT directive no experts were available, but instead the manually created list of relevant terms and expressions had to be used.

3 Discussion The figure shows how the experiment has been set up. A problem was that the experiments could not be performed completely in parallel. Also, the Privacy experts inter rater agreement was extremely low (due to their different background).

4 Conclusion Even if the current experiments give a partially indecisive answer to the simple question whether the automatic evaluation procedure is up to providing a reliable indication on the quali