An efficient generic approach for automatic taxonomy generation using HMMs

PDF / 3,156,363 Bytes
20 Pages / 595.276 x 790.866 pts Page_size
5 Downloads / 234 Views

THEORETICAL ADVANCES

An efficient generic approach for automatic taxonomy generation using HMMs Sylvain Iloga1,2,4 · Olivier Romain2 · Maurice Tchuenté3,4 Received: 5 March 2020 / Accepted: 9 September 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Taxonomies are essential tools for fast information retrieval and classification of knowledge. Many existing techniques for automatic taxonomy generation strongly depend on the specific properties of a particular domain and are consequently hard to apply to other domains. Some attempts have been made to design taxonomies for multiple domains. Unfortunately, they induce high hierarchical classification error rates for some datasets. The automatic design of a taxonomy requires the capability of measuring the similarity between classes. More precisely, the fact that two classes are near intuitively implies that some elements of one class are scattered in the neighborhood of some elements of the other class. This observation is used in this paper to propose a new generic technique for automatic taxonomy generation. A topological analysis of the neighborhood of each instance is first performed. The results of this analysis are used to initialize and train a hidden Markov model for each class. The model of a given class c captures the frequencies of the classes found in the neighborhood of the instances of c, from the most dominant class to the least dominant. The similarities between these models are finally used to derive a taxonomy. Hierarchical classification experiments realized on 20 datasets from various domains showed an average accuracy of 97.22% and a standard deviation of 4.11% . Comparison results revealed that the proposed approach outperforms existing work with accuracy gains reaching 38.62% for one dataset. Keywords Automatic taxonomy generation · Hidden Markov models · Hierarchical classification

1 Introduction A taxonomy is an essential tool for fast information retrieval and classification of knowledge [1]. A taxonomy provides an efficient navigating and browsing mechanism by organizing * Sylvain Iloga [email protected] Olivier Romain [email protected] Maurice Tchuenté [email protected] 1

Department of Computer Science, Higher Teachers’ Training College, University of Maroua, P.O.box 55, Maroua, Cameroon

2

ENSEA, CNRS, ETIS UMR 8051, CY Cergy Paris University, 95000 Cergy, France

3

Department of Computer Science, Faculty of Science, University of Yaoundé I, P.O.box 812, Yaoundé, Cameroon

4

IRD, UMMISCO, University of Sorbonne, 93143 Bondy, France

huge volumes of data into a relatively small number of hierarchical clusters [2]. In this hierarchy, broad concepts are at the top and more specific concepts are further down [3]. Initially introduced for the classification of biological species, the concept of taxonomy is nowadays used in several other areas related to data (text, audio, image and video) mining and processing. One of the most important tasks in taxonomy generation is the evaluat

Data Loading...

An efficient generic approach for automatic taxonomy generation using HMMs

Recommend Documents

Automatic Generation of Computer Animation Using AI for Movie An

Automatic Table-of-Contents Generation for Efficient Information Access

An Efficient Approach for Barcode Encoding/Decoding Using Pattern Substitution

Sign Language Recognizer Using HMMs

Automatic generation of efficient policy alternatives via simulation-optimization

An Efficient Optimization Approach for Coordination of Network Reconfiguration and PV Generation on Performance Improvem

An efficient data transmission approach using IAES-BE

An efficient focusing model for generation of freak waves

Automatic Malware Analysis An Emulator Based Approach

Automatic Graphics Generation

Automatic Schema Generation for Document-Oriented Systems

An efficient generation grinding method for spur face gear along contact trace using disk CBN wheel