Cluster-based information retrieval using pattern mining

PDF / 1,097,120 Bytes
16 Pages / 595.224 x 790.955 pts Page_size
50 Downloads / 270 Views

Cluster-based information retrieval using pattern mining Youcef Djenouri1

· Asma Belhadi2 · Djamel Djenouri3 · Jerry Chun-Wei Lin4

Accepted: 1 September 2020 © The Author(s) 2020

Abstract This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects. It addresses the common drawbacks of cluster-based approaches and targets fast, high-quality information retrieval. For this purpose, a novel cluster-based information retrieval approach is proposed, named Cluster-based Retrieval using Pattern Mining (CRPM). This approach integrates various clustering and pattern mining algorithms. First, it generates clusters of objects that contain similar objects. Three clustering algorithms based on k-means, DBSCAN (Density-based spatial clustering of applications with noise), and Spectral are suggested to minimize the number of shared terms among the clusters of objects. Second, frequent and high-utility pattern mining algorithms are performed on each cluster to extract the pattern bases. Third, the clusters of objects are ranked for every query. In this context, two ranking strategies are proposed: i) Score Pattern Computing (SPC), which calculates a score representing the similarity between a user query and a cluster; and ii) Weighted Terms in Clusters (WTC), which calculates a weight for every term and uses the relevant terms to compute the score between a user query and each cluster. Irrelevant information derived from the pattern bases is also used to deal with unexpected user queries. To evaluate the proposed approach, extensive experiments were carried out on two use cases: the documents and tweets corpus. The results showed that the designed approach outperformed traditional and cluster-based information retrieval approaches in terms of the quality of the returned objects while being very competitive in terms of runtime. Keywords Information retrieval · Data mining · Cluster-based approaches · Frequent and high-utility pattern mining.

Youcef Djenouri

1 Introduction

[email protected] Asma Belhadi [email protected] Djamel Djenouri [email protected] Jerry Chun-Wei Lin [email protected] 1

Dept. of Mathematics and Cybernetics, SINTEF Digital, Oslo, Norway

2

Kristiania University College, Oslo, Norway

3

Computer Science Research Center, Dep. of Computer Science and Creative Technology, University of the West of England, Bristol, UK

4

Dept. of Computing, Mathematics and Physics, Western Norway University of Applied Sciences (HVL), Bergen, Norway

Data mining [1, 2] is an interdisciplinary field that deals with the extraction of information from a large set of data and transformation into an easily interpretable structure for further use. Information retrieval (IR) is the task of retrieving the information that is relevant to a user query (represented by a set of terms) from a collection of objects [3]. Several variant IR problems have been considered in the literature. For instance, document information retrieval (DIR) [4

Data Loading...

Cluster-based information retrieval using pattern mining

Recommend Documents

Opinion Mining in Information Retrieval

Information Retrieval and Mining in Distributed Environments

Sequential Pattern Mining Using IDLists

Multidisciplinary Information Retrieval 6th Information Retrieval Fa

Multidisciplinary Information Retrieval 5th Information Retrieval Fa

Scholarly literature mining with Information Retrieval and Natural Language Processing

Factoid Mining Based Content Trust Model for Information Retrieval

Information Retrieval Technology Asia Information Retrieval Symposiu

Multidisciplinary Information Retrieval 7th Information Retrieval Fa

Multidisciplinary Information Retrieval Second Information Retrieval

Information Retrieval Using Rough Set Approximations

Information Retrieval