Navigation-based candidate expansion and pretrained language models for citation recommendation

PDF / 854,631 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
100 Downloads / 246 Views

Navigation‑based candidate expansion and pretrained language models for citation recommendation Rodrigo Nogueira1,2 · Zhiying Jiang2 · Kyunghyun Cho3,4,5,6 · Jimmy Lin2 Received: 19 May 2020 © Akadémiai Kiadó, Budapest, Hungary 2020

Abstract Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We treat this task as a ranking problem, which we tackle with a two-stage approach: candidate generation followed by reranking. Within this framework, we adapt to the scientific domain a proven combination based on “bag of words” retrieval followed by rescoring with a BERT model. We experimentally show the effects of domain adaptation, both in terms of pretraining on in-domain data and exploiting in-domain vocabulary. In addition, we introduce a novel navigation-based document expansion strategy to enrich the candidate documents fed into our neural models. On three benchmark datasets, our methods achieve or rival the state of the art in the citation recommendation task. Keywords Transformers · Domain adaptation · Citation graph

Introduction The volume of scientific publications is growing at an incredible rate. For example, nearly a million articles are added per year to MEDLINE, a bibliographic database of the life sciences and biomedical literature.1 A recent study estimates that three million papers are published annually in the English language, with a growth rate of 3–5% per year (Johnson et al. 2018). This flood of information has made it nearly impossible for researchers to keep abreast of discoveries and innovations, both in their specific sub-field as well as more broadly. Furthermore, there is an overwhelming amount of material that a scientist entering 1

https://www.nlm.nih.gov/bsd/stats/cit_added.html.

* Rodrigo Nogueira [email protected] 1

Tandon School of Engineering, New York University, New York, USA

2

David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada

3

Courant Institute of Mathematical Sciences, New York University, New York, USA

4

Center for Data Science, New York University, New York, USA

5

Facebook AI Research, New York, USA

6

CIFAR Azrieli Global Scholar, Toronto, Canada

13

Vol.:(0123456789)

Scientometrics

a new field of study needs to read before becoming familiarized with common concepts, methods, and other foundations. A number of tools have come along to help researchers cope with this deluge. For example, keyword-based literature search engines (Google Scholar, Microsoft Academic, PubMed, and Semantic Scholar) and citation recommendation tools (Bollacker et al. 1999; Basu et al. 2001; McNee et al. 2002; Kodakateri Pudhiyaveetil et al. 2009; He et al. 2010) help scientists find relevant articles, often exploiting citation networks to identify what’s important in a particular field. Methods to automatically populate scientific knowledge bases (Gao et al. 2006; Spangler

Data Loading...

Navigation-based candidate expansion and pretrained language models for citation recommendation

Recommend Documents

Ensemble Distilling Pretrained Language Models for Machine Translation Quality Estimation

A graph-based taxonomy of citation recommendation models

BERTimbau: Pretrained BERT Models for Brazilian Portuguese

Content-Based Hybrid Deep Neural Network Citation Recommendation Method

Research of Paper Recommendation System Based on Citation Network Model

Candidate Classification and Skill Recommendation in a CV Recommender System

Term Expansion Models

Language Models

Optimizing Language Models for Polarity Classification

Citation

Spoken Language Dialogue Models

Co-citation