Semantically-enhanced information retrieval using multiple knowledge sources
- PDF / 2,345,124 Bytes
- 20 Pages / 595.276 x 790.866 pts Page_size
- 78 Downloads / 228 Views
(0123456789().,-volV)(0123456789(). ,- volV)
Semantically-enhanced information retrieval using multiple knowledge sources Yuncheng Jiang1 Received: 3 April 2019 / Revised: 27 December 2019 / Accepted: 27 January 2020 Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Classical or traditional Information Retrieval (IR) approaches rely on the word-based representations of query and documents in the collection. The specification of the user information need is completely based on words figuring in the original query in order to retrieve documents containing those words. Such approaches have been limited due to the absence of relevant keywords as well as the term variation in documents and user’s query. The purpose of this paper is to present a new method to Semantic Information Retrieval (SIR) to solve the limitations of existing approaches. Concretely, we propose a novel method SIRWWO (Semantic Information Retrieval using Wikipedia, WordNet, and domain Ontologies) for SIR by combining multiple knowledge sources Wikipedia, WordNet, and Description Logic (DL) ontologies. In order to illustrate the approach SIRWWO, we first present the notion of Labeled Dynamic Semantic Network (LDSN) by extending the notions of dynamic semantic network and extended semantic net based on WordNet (and DAML ontology library). According to the notion of LDSN, we obtain the notion of Weighted Dynamic Semantic Network (WDSN, intuitively, each edge in WDSN is assigned to a number in the [0, 1] interval) and give the WDSN construction method using Wikipedia, WordNet, and DL ontology. We then propose a novel metric to measure the semantic relatedness between concepts based on WDSN. Lastly, we investigate the approach SIRWWO by using semantic relatedness between users’ query keywords and digital documents. The experimental results show that our proposals obtain comparable and better performance results than other traditional IR system Lucene. Keywords Information retrieval Keyword search Semantic relatedness Multiple knowledge sources
1 Introduction The production of digital contents like documents and Web pages is currently one of the most rapidly growing processes in the information age. This implies the creation of a plethora of information with related problems in organizing, managing, and searching in digital document catalog. One of the most representative examples of this scenario is the World Wide Web [53]. In order to handle multitudinous digital contents, Information Retrieval (IR) and related theories and technologies for the acquisition, management, and application of digital contents have risen as an important issue [9, 12]. IR is currently a scientific & Yuncheng Jiang [email protected]; [email protected] 1
School of Computer Science, South China Normal University, Guangzhou 510631, China
research field concerned with the design of models and techniques for selecting relevant information in response to user queries within a collection (corpus) of documents [13]. Especially, keyword search engi
Data Loading...