Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

PDF / 2,623,529 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
19 Downloads / 218 Views

ORIGINAL ARTICLE

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem Christos Bouras • Vassilis Tsogkas

Received: 12 December 2013 / Accepted: 29 April 2014 Springer-Verlag Berlin Heidelberg 2014

Abstract Collaborative filtering systems typically need to acquire some data about the new user in order to start making personalized suggestions, a situation commonly referred to as the ‘‘new user problem’’. In this work we attempt to address the new user problem via a unique personalized strategy for prompting the user with articles to rate. Our approach makes use of hypernyms extracted from the WordNet database and proves to be converging fast to the actual user interests based on minimal user ratings, which are provided during the registration process. In addition, we explore the possible enhancement of the document clustering results, and in particular clustering of news articles from the web, when using word-based n-grams during the keyword extraction phase. We present and evaluate a weighting approach that combines clustering of news articles derived from the web, using n-grams that are extracted from the articles at an offline stage. This technique is then compared with the single minded ‘‘bagof-words’’ representation that our clustering algorithm, W-kmeans, previously used. Our experimentation reveals that via fine tuning the weighting parameters between keyword and n-grams, as well as the n value itself, a significant improvement regarding the clustering results metrics can be achieved.

Keywords New user problem Collaborative filtering Clustering W-kmeans K-means Personalized strategy n-grams Text preprocessing

1 Introduction Every day, more and more news articles, books, journals, research papers, web pages, and movies are being made available online. While available information is growing in volumes, we quickly become overwhelmed and seek assistance in finding the most interesting, valuable, or entertaining items on which we should spend our scarce time. Historically, humans have adapted well to pieces of information and have developed an excellent filtering ability to make quick judgments. The technologies that are commonly used to address the previously mentioned information overload challenges are basically three. Each one of them focuses primarily on a particular set of tasks or questions: •

•

C. Bouras V. Tsogkas Computer Engineering and Informatics Department, University of Patras, Patras, Greece e-mail: [email protected] C. Bouras (&) Computer Technology Institute and Press ‘‘Diophantus’’, Rion, 26500 Patras, Greece e-mail: [email protected]

•

Information Retrieval (IR), which focuses on tasks involving fulfilling ephemeral interest queries, such as finding the articles related to president Obama Information Filtering (IF), which focuses on tasks involving classifying streams of new content into categories, such as finding any newly released articles regarding the political situation in Middle East, or any newly released movies w

Data Loading...

Assisting cluster coherency via n-grams and clustering as a tool to deal with the new user problem

Recommend Documents

Bioinformatics: A New Insight Tool to Deal with Environment Management

Modeling Evolving User Behavior via Sequential Clustering

Coherency Strain and a New Yield Criterion

Silhouette Index as Clustering Evaluation Tool

Development of a User-Friendly App for Assisting Anticoagulation Treatment

How to Deal with Climate Change? Institutional Adaptive Capacity as

Assisting Ventilation by Pressure Support: More than a Weaning Tool

The Green New Deal and Cultural Policy

How to Deal with the Difficult Patient

A New Methodology to Deal with the Multi-phase Degradation in Rolling Element Bearing Prognostics

Why did the Great Recession Fail to Produce a New New Deal in the USA?

AMDock: a versatile graphical tool for assisting molecular docking with Autodock Vina and Autodock4