A data ecosystem to support machine learning in materials science

  • PDF / 653,682 Bytes
  • 9 Pages / 612 x 792 pts (letter) Page_size
  • 46 Downloads / 192 Views

DOWNLOAD

REPORT


Artificial Intelligence Research Letter

A data ecosystem to support machine learning in materials science Ben Blaiszik and Logan Ward , Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA; Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA Marcus Schwarting, Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA Jonathon Gaff, Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA Ryan Chard, Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA; Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA Daniel Pike, Department of Computer Science, Cornell University, Ithaca, NY, USA Kyle Chard and Ian Foster , Globus, Department of Computer Science, University of Chicago, Chicago, IL, USA; Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA Address all correspondence to Ben Blaiszik at [email protected] (Received 12 April 2019; accepted 23 August 2019)

Abstract Facilitating the application of machine learning (ML) to materials science problems requires enhancing the data ecosystem to enable discovery and collection of data from many sources, automated dissemination of new data across the ecosystem, and the connecting of data with materials-specific ML models. Here, we present two projects, the Materials Data Facility (MDF) and the Data and Learning Hub for Science (DLHub), that address these needs. We use examples to show how MDF and DLHub capabilities can be leveraged to link data with ML models and how users can access those capabilities through web and programmatic interfaces.

Introduction A growing opportunity exists for the materials science community to leverage and build upon the advances in machine learning (ML) and artificial intelligence (AI) that are reorienting and reorganizing industries across the economy. In materials science, there is well-founded optimism that such advances may allow for a greatly increased rate of discovery, development, and deployment of novel materials, bringing researchers closer to realizing the vision of the Materials Genome Initiative.[1] However, despite considerable growth in the number of materials datasets and the volume of data available, researchers continue to lack easy access to high-quality machine-readable data of sufficient volume and breadth to solve many interesting problems. They also struggle with growing diversity and complexity in the data science and learning software required to apply ML and AI techniques to materials problems: software that includes not only materials-specific tools but also a wide range of other data transformation, data analysis, and ML/AI components, many not designed specifically for materials problems. Seizing the opportunity of ML and AI for materials discovery, thus, requires not just more and better data and software: it requires new approaches to navigating and combining data sources and tools that allow researchers to easily discover, access, in