A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials databas
- PDF / 957,803 Bytes
- 8 Pages / 612 x 792 pts (letter) Page_size
- 101 Downloads / 159 Views
Artificial Intelligence Research Letter
A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map Vineeth Venugopal, Scott R. Broderick, and Krishna Rajan, Department of Materials Design and Innovation, University at Buffalo, Buffalo, NY, USA Address all correspondence to Krishna Rajan at [email protected] (Received 15 February 2019; accepted 23 September 2019)
Abstract This paper demonstrates the application of Natural Language Processing (NLP) tools to explore large libraries of documents and to correlate heuristic associations between text descriptions in figure captions with interpretations of images and figures. The use of visualization tools based on NLP methods permits one to quickly assess the extent of the research described in the literature related to a specific topic. The authors demonstrate how the use of NLP methods on only the figure captions without having to navigate the entire text of a document can provide an accelerated assessment of the literature in a given domain.
Introduction The continuous improvement and power of Natural Language Processing (NLP) tools has spawned many studies to navigate the literature in materials science and has been demonstrated for selected use cases.[1–10] Variational Autoencoders, for example, have been suggested as a way to derive specific processing parameters from the existing scientific literature.[11] Specific processing–property relationships have been demonstrated through tailored entity extraction tools such as ChemDataExtractor[12] to automatically populate thermal and magnetic databases of some materials.[13] In this paper, we introduce another genre of application to utilize NLP to interrogate and harness knowledge embedded in documents in the materials science literature. Figures exist in many different forms, and much of the interpretation of processing–structure–property relationships is based on the instantaneous ability to identify what the figure quantitatively provides. While someone with domain expertise can provide this interpretation, the ability to scale this up if we have hundreds of thousands of figures and images is a totally different challenge. The aim in applying NLP tools should not simply be to track where words occur or count their frequency but rather to capture more subjective relationships which drive our ability to read the literature and to connect images to the text. As a case study for the development and knowledge gain possible from NLP tools, we use quantum materials as our platform. We use NLP tools to identify correlations between the text in figure captions within the quantum material literature, providing guidance on the types of techniques and properties that have been explored. This provides mapping and compression of the information in the quantum materials area and also provides guidance as to potential data
sources or types of images for the domain experts to use in uncovering structure–property relationships. The term “quantum materials” covers an incredib
Data Loading...