An LDA-Based Approach to Scientific Paper Recommendation

Recommendation of scientific papers is a task aimed to support researchers in accessing relevant articles from a large pool of unseen articles. When writing a paper, a researcher focuses on the topics related to her/his scientific domain, by using a techn

  • PDF / 760,990 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 107 Downloads / 211 Views

DOWNLOAD

REPORT


3

Department of Informatics, Systems and Communication, University of Milano Bicocca, Milan, Italy {amami,pasi,stella}@disco.unimib.it 2 LARODEC, ISG, University of Tunis, Tunis, Tunisia LARODEC, IHEC, University of Carthage, Tunis, Tunisia [email protected]

Abstract. Recommendation of scientific papers is a task aimed to support researchers in accessing relevant articles from a large pool of unseen articles. When writing a paper, a researcher focuses on the topics related to her/his scientific domain, by using a technical language. The core idea of this paper is to exploit the topics related to the researchers scientific production (authored articles) to formally define her/his profile; in particular we propose to employ topic modeling to formally represent the user profile, and language modeling to formally represent each unseen paper. The recommendation technique we propose relies on the assessment of the closeness of the language used in the researchers papers and the one employed in the unseen papers. The proposed approach exploits a reliable knowledge source for building the user profile, and it alleviates the cold-start problem, typical of collaborative filtering techniques. We also present a preliminary evaluation of our approach on the DBLP. Keywords: Content-based recommendation · Scientific papers recommendation · Researcher profile · Topic modeling · Language modeling

1

Introduction

In the last years a big deal of research has addressed the issue of scientific papers recommendation. This problem has become more and more compelling due to the information overload phenomenon suffered by several categories of users, including the scientific community. Indeed, the increasing number of scientific papers published every day implies that a researcher spends a lot of time to find publications relevant to her/his research interests. In particular, recommender systems serve in this context the purpose of providing the researchers with a direct recommendation of contents that are likely to fit their needs. Most approaches in the literature have addressed this problem by means of collaborative filtering (CF) techniques, which evaluate items (in this case papers) based on the behavior of other users (researchers), by exploiting the c Springer International Publishing Switzerland 2016  E. M´ etais et al. (Eds.): NLDB 2016, LNCS 9612, pp. 200–210, 2016. DOI: 10.1007/978-3-319-41754-7 17

An LDA-Based Approach to Scientific Paper Recommendation

201

rates assigned by other users to the considered items. However, generally, CF approaches assume that the number of users is much larger than the number of items [9]. This is verified in applications like movie recommendations, where there are usually few items and several users. For instance, the MovieLens 1M1 dataset contains 1,000,209 ratings from 6,040 users and 3,706 movies [8]. Moreover the users are clients who are very likely to interact with the system several times, often consuming similar items; therefore ratings are quite easy to obtain. Hence, CF recommendation models can