The machine learning revolution in materials?
- PDF / 1,314,540 Bytes
- 8 Pages / 585 x 783 pts Page_size
- 71 Downloads / 189 Views
Over a span of a decade, efforts such as the Materials Genome Initiative1 (MGI) and the Integrated Computational Materials Engineering2 (ICME) programs have seen large successes in the creation of a materials data and computational infrastructure. Large-scale systems have been developed and populated to allow for easy access to volumes of experimental and simulation data. These data and the associated infrastructure that have been built to ingest, store, and manipulate it are not ends in and of themselves, however. Instead, they are the means to the ambitious goal of accelerated materials design and discovery. A key question, therefore, is how to best use such data to achieve this goal. In addition to the data and infrastructure, we must also consider the various techniques and methods used to analyze, interact with, or otherwise draw conclusions from that data. Indeed, with such a data infrastructure in place, datadriven techniques and analysis have already become commonplace within the materials science community. In the past few years, these data analytic techniques have enjoyed an evolution in accuracy, robustness, and utility comparable to the development of the materials data infrastructure itself. The data analytic techniques employed by the materials community are broad and diverse. Exploratory data analysis and visualization techniques allow us to “look” at our data
(a requisite first step in any analysis) to identify qualitative trends or outliers. Statistical regression techniques learn the mappings from material descriptors to properties or device performance. Unsupervised methods such as clustering or dimensionality reduction algorithms allow us to find hidden connections between and within families of materials by examining high dimensional representations of them and discovering structure within these representations. Combined, these techniques can help accelerate the design of materials with new or optimal properties by making predictions or discovering insights latent in the data. One workflow, which we will broadly call virtual screening,3 involves collecting a large amount of numeric data specifying several descriptors for a set of materials, fitting a model to predict structure or properties of materials from these descriptors, and using this model to identify those materials that produce optimal structure or properties. Virtual screening can suggest material designs predicted to be optimal with respect to certain properties or structures. These designs are then tested in the laboratory, by synthesizing the materials and characterizing their properties. Given sufficient amounts of data, virtual screening can accelerate the design of optimal materials and is perhaps one of the most popular data analytic workflows used in the materials community. Examples of its success include the design of organic light-emitting diodes,4 metal–organic frameworks,5 and drugs.6
Kristofer G. Reyes, Department of Materials Design and Innovation, University at Buffalo, The State University of New York, USA; kreyes3@
Data Loading...