A Concept-Based Approach for Generating Better Topics for Web Search Results
- PDF / 1,135,788 Bytes
- 14 Pages / 595.276 x 790.866 pts Page_size
- 79 Downloads / 194 Views
ORIGINAL RESEARCH
A Concept‑Based Approach for Generating Better Topics for Web Search Results N. Mehala1 · Divyansh Bhatia2 Received: 21 August 2019 / Accepted: 28 August 2020 © Springer Nature Singapore Pte Ltd 2020
Abstract As the web is accessible to a vast population around the globe, web users today pose a large number of queries, with dynamic, vague and unclear intentions, using the web search tools, as a consequence of which organizing search results have become an all the more challenging task. Further, because of such web queries, it is difficult for web search tools to comprehend the exact user context, and thus they retrieve an extensive volume of results, a significant portion of which are unnecessary for the user. One of the answers to this problem is a strategy called search result clustering (SRC), which bunches the search results and presents them to users with many options for the query. In this work, we have proposed an approach that initially classifies the related topics and lays them out in the form of concepts, and then building search results clusters by designating each to the relevant topic and finally, providing relevant labels for these topics. We examine the effectiveness of our approach by measuring it against two most popular non-commercial methods in this field, specifically Lingo and STC, with two standard datasets, ODP and Ambient, and a newly developed dataset, Ex-Ambient, which is a rigorously extended version of the Ambient Dataset. We performed analysis on both qualitative and quantitative dimensions. We define a qualitative dimension as the expressiveness of the cluster label generated, while quantitative dimension regards the correctness of the document assigned to the cluster. The experimental results presented by the proposed method were encouraging in contrast with Lingo and STC for all the datasets and both the dimensions. Keywords Web search effectiveness · Internet · Query disambiguation · Topics · Concepts · Search result clustering
Introduction The content of the web has grown at an exponential pace [1]; therefore, finding relevant information has become more and more difficult. Existing search engines deliver a substantial amount of results for a particular search query. They try to organize the results, such that pages that are more relevant to the query appear first. This process is one of the more difficult tasks for the search engine, because of the large, diverse, This article is part of the topical collection “Advances in Internet Research and Engineering” guest edited by Mohit Sethi, Debabrata Das, P. V. Ananda Mohan and Balaji Rajendran. * Divyansh Bhatia [email protected] N. Mehala [email protected] 1
PES University, Bengaluru, India
eBay Inc., San Jose, CA, USA
2
and complex intentions of a query. Moreover, queries may be short, imprecise, vague, and ambiguous. Approximately, 30% of the queries given to the search engines are a single word [2]. About 7%–23% of the queries occurring in the search logs of large web search systems are ambiguous [3,
Data Loading...