An efficient generic approach for automatic taxonomy generation using HMMs

  • PDF / 3,156,363 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 5 Downloads / 218 Views

DOWNLOAD

REPORT


THEORETICAL ADVANCES

An efficient generic approach for automatic taxonomy generation using HMMs Sylvain Iloga1,2,4   · Olivier Romain2 · Maurice Tchuenté3,4 Received: 5 March 2020 / Accepted: 9 September 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Taxonomies are essential tools for fast information retrieval and classification of knowledge. Many existing techniques for automatic taxonomy generation strongly depend on the specific properties of a particular domain and are consequently hard to apply to other domains. Some attempts have been made to design taxonomies for multiple domains. Unfortunately, they induce high hierarchical classification error rates for some datasets. The automatic design of a taxonomy requires the capability of measuring the similarity between classes. More precisely, the fact that two classes are near intuitively implies that some elements of one class are scattered in the neighborhood of some elements of the other class. This observation is used in this paper to propose a new generic technique for automatic taxonomy generation. A topological analysis of the neighborhood of each instance is first performed. The results of this analysis are used to initialize and train a hidden Markov model for each class. The model of a given class c captures the frequencies of the classes found in the neighborhood of the instances of c, from the most dominant class to the least dominant. The similarities between these models are finally used to derive a taxonomy. Hierarchical classification experiments realized on 20 datasets from various domains showed an average accuracy of 97.22% and a standard deviation of 4.11% . Comparison results revealed that the proposed approach outperforms existing work with accuracy gains reaching 38.62% for one dataset. Keywords  Automatic taxonomy generation · Hidden Markov models · Hierarchical classification

1 Introduction A taxonomy is an essential tool for fast information retrieval and classification of knowledge [1]. A taxonomy provides an efficient navigating and browsing mechanism by organizing * Sylvain Iloga [email protected] Olivier Romain [email protected] Maurice Tchuenté [email protected] 1



Department of Computer Science, Higher Teachers’ Training College, University of Maroua, P.O.box 55, Maroua, Cameroon

2



ENSEA, CNRS, ETIS UMR 8051, CY Cergy Paris University, 95000 Cergy, France

3

Department of Computer Science, Faculty of Science, University of Yaoundé I, P.O.box 812, Yaoundé, Cameroon

4

IRD, UMMISCO, University of Sorbonne, 93143 Bondy, France



huge volumes of data into a relatively small number of hierarchical clusters [2]. In this hierarchy, broad concepts are at the top and more specific concepts are further down [3]. Initially introduced for the classification of biological species, the concept of taxonomy is nowadays used in several other areas related to data (text, audio, image and video) mining and processing. One of the most important tasks in taxonomy generation is the evaluat