Towards Computation of Novel Ideas from Corpora of Scientific Text

In this work we present a method for the computation of novel ‘ideas’ from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classi

PDF / 703,656 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
96 Downloads / 207 Views

DOWNLOAD

REPORT

School Of Computer Science, University of Nottingham Malaysia Campus, Jalan Broga, 43500 Semenyih, Selangor Darul Ehsan, Malaysia {khyx3lhi,tim.brailsford}@nottingham.edu.my 2 Horizon Digital Economy Research, School of Computer Science, University of Nottingham, Nottingham NG7 2TU, UK [email protected] Abstract. In this work we present a method for the computation of novel ‘ideas’ from corpora of scientiﬁc text. The system functions by ﬁrst detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach. By deﬁning an idea as a co-occurring pair, known-idea triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an ‘idea frequency-inverse document frequency’ score). The resulting triples are then fed into a collaborative ﬁltering algorithm, where problem-phrases are considered as users and solution-phrases as the items to be recommended. The ﬁnal output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal Science, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion. Keywords: Idea mining · Text mining · Natural language processing Recommender systems · Collaborative ﬁltering

1

·

Introduction

The process of attacking problems by ﬁrst canvassing participants for spontaneous ideas, collating their responses and distilling the results, is often referred to as brainstorming. The term, as popularized by Osborn [26] and expanded upon by Kling [22] and Jessop [19], now corresponds to a well-known set of guidelines for generating creative solutions that entail: discussion of the problem; unconstrained consideration as to how best to solve the problem; screening of the contributions; and, ﬁnally, commitment to action. While this approach to problem solving has traditionally required active human participation, in this paper we explore the following challenge: given the inordinate amount of scientific literature now accessible via the web, is it possible to automate the brainstorming process via machine learning? c Springer International Publishing Switzerland 2015 A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 541–556, 2015. DOI: 10.1007/978-3-319-23525-7 33

542

H. Liu et al.

While the idea of supporting the ideation process via technology is not new (the term Computer-Assisted Brainstorming was coined three decades ago [17]), prior research has focussed on visualization tools, organizational applications and associated Human-Computer Interaction challenges [5,6,14]. However, text mining and computational linguistic techniques have now progressed to the point that notions of automatical

Data Loading...

Towards Computation of Novel Ideas from Corpora of Scientific Text

Recommend Documents

Python in Scientific Computation

Corpora in Text-Based Russian Studies

Introduction: Novel Ideas

Stealing Ideas from Nature

PETSc (Portable, Extensible Toolkit for Scientific Computation)

Corpora

Text Genres and Registers: The Computation of Linguistic Features

Text Analysis Pipelines Towards Ad-hoc Large-Scale Text Mining

Multimodal Corpora From Models of Natural Interaction to Systems

Towards a New Evolutionary Computation Advances in the Estimation of

Towards Efficient Interactive Computation of Dynamic Time Warping Distance

Towards Multiparty Computation Withstanding Coercion of All Parties