Probabilistic Topic Models for Enriching Ontology from Texts

  • PDF / 2,656,408 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 9 Downloads / 191 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

Probabilistic Topic Models for Enriching Ontology from Texts Anis Tissaoui1   · Salma Sassi1 · Richard Chbeir2 Received: 26 November 2019 / Accepted: 28 September 2020 © Springer Nature Singapore Pte Ltd 2020

Abstract The ontology enrichment process is text-based and the application domain in hand is circumscribed to the content of the related texts. However, the main challenge in ontology enrichment is its learning, since there is still a lack of relevant approach able to achieve automatic enrichment from a textual corpus or dataset of various topics. In this paper, we describe a new approach for automatic learning of terminological ontologies from textual corpus based on probabilistic models. In our approach, two topic modeling algorithms are explored, namely LDA and pLSA for learning topic ontology. The objective is to capture semantic relationships between word-topic and topic-document in terms of probability distributions to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and retrieving corresponding topic ontology for a user query demonstrates the effectiveness of the proposed approach. Keywords  Knowledge acquisition · Ontology enrichment · Ontology learning · Probabilistic topic models

Introduction Most of the ontology engineering methodologies consist of four phases: (a) feasibility study, (b) requirements analysis, (c) conceptualization, and (d) deployment, evaluation, and maintenance of the ontology. Conceptualization is one of the trickiest phases that we address in this research and can be divided into: development of the domain model, formalization of the model, and its implementation in a certain ontology language. Having this in mind, knowledge bases can be created by extracting the relevant instances from information This article is part of the topical collection “Web for Information and Knowledge Exploration, Sharing and Security (Section 1: Web2Touch)” guest edited by Haider Abbas, Hammad Afzal, Rodrigo Bonacin, Ismail Bouassida, Khalil Drira, Riccardo Martoglia, Olga Nabuco, and Fatiha Saïs. * Anis Tissaoui [email protected] Salma Sassi [email protected] Richard Chbeir richard.chbeir@univ‑pau.fr 1



VPNC Lab., FSJEG, University of Jendouba, Avenue Union Maghreb Arabe, 8189 Jendouba, Tunisia



Univ Pau et Pays Adour, E2S UPPA, LIUPPA EA3000, 64600 Anglet, France

2

to populate the corresponding ontologies, a process known as ontology population or knowledge markup. There are several methodologies for building ontologies (Seven-Step method [43], knowledge Engineering method [33], METHODOLOGY [22], TOVE [26], Software Engineering [41], and NeON [29]). However, the amount of data generated daily by human activities has increased significantly, to such an extent that it becomes currently strictly impossible to process them manually. Thus, the tools used for document modeling have traditionally faced major challenges related to an over-whelming amount of unstructured or semi-struct