Scholarly literature mining with information retrieval and natural language processing: Preface

  • PDF / 644,220 Bytes
  • 6 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 148 Views

DOWNLOAD

REPORT


Scholarly literature mining with information retrieval and natural language processing: Preface Guillaume Cabanac1   · Ingo Frommholz2   · Philipp Mayr3  Received: 9 October 2020 / Published online: 17 November 2020 © Akadémiai Kiadó, Budapest, Hungary 2020

Introduction This special issue features the work of authors originally coming from different communities: bibliometrics/scientometrics (SCIM), information retrieval (IR) and, as an emerging player gaining more relevance for both aforementioned fields, natural language processing (NLP). The work presented in their papers combine ideas from all these fields, having in common that they all are using the scholarly data well known in scientometrics and solving problems typical to scientometric research. They model and mine citations, as well as metadata of bibliographic records (authorships, titles, abstracts sometimes), which is common practice in SCIM. They also mine and process fulltexts (including in-text references and equations) which is common practice in IR and requires established NLP text mining techniques. IR collections are utilised to ensure reproducible evaluations; creating and sharing test collections in evaluation initiatives such as CLEF eHealth1 is common IR tradition that is also prominent in NLP, eg., by the CL-SciSumm shared task.2 From an IR perspective, surprisingly, scholarly information retrieval and recommendation, though gaining momentum, have not always been the focus of research in the past. Besides operating on a rich set of data for researchers in all three disciplines to play with, scholarly search poses challenges in particular for IR due to the complex information needs that require different approaches than known from, e.g., Web search, where information needs are simpler in many cases. As an example, the current COVID-19 crisis shows that hybrid SCIM/IR/NLP approaches are increasingly required to ensure researchers get access 1

  https​://clefe​healt​h.imag.fr.   https​://githu​b.com/WING-NUS/scisu​mm-corpu​s.

2

* Guillaume Cabanac guillaume.cabanac@univ‑tlse3.fr Ingo Frommholz [email protected] Philipp Mayr [email protected] 1

Computer Science Department, IRIT UMR 5505 CNRS, University of Toulouse, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France

2

University of Bedfordshire, Luton LU1 3JU, UK

3

GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany



13

Vol.:(0123456789)

2836

Scientometrics (2020) 125:2835–2840

to important relevant and high-quality information, often only available on preprint servers, in a short period of time (Brainard 2020; Fraser et al. 2020; Kwon 2020; Palayew et al. 2020). These kinds of complex information needs pose challenges which have been recognised by the Information Retrieval community that quickly launched the TREC-COVID initiative run by NIST (Roberts et al. 2020), demonstrating the timeliness of our endeavour and this special issue. Working on scholarly material thus has incentives for researchers in Information Retrieval but we believe the challenges can