Harnessing the Materials Project for machine-learning and accelerated discovery

  • PDF / 1,995,456 Bytes
  • 6 Pages / 585 x 783 pts Page_size
  • 11 Downloads / 227 Views

DOWNLOAD

REPORT


Introduction Accelerated data-driven learning aims to reveal hidden relationships in data sets, usually with the end goal of building predictive models that outperform traditional ones. Recently, machine-learning (ML) models have enjoyed success in solving long-standing computer science problems that have eluded direct solution. Examples of such breakthroughs1 include those in image recognition, speech recognition, and even “scientific” tasks such as analyzing particle accelerator data. The application of ML involves three steps: (1) curation of an input data set of sufficient size and quality, (2) mapping the input data to features/descriptors and targets, and (3) model fitting. Several features make the Materials Project2 database uniquely suited for building ML models. First, the variety and quality of data available, both in terms of the properties (e.g., formation energies, energies above the convex hull, bandgaps, elastic constants, and surface energies) as well

as the chemistry coverage (most known, unique, crystalline systems in the International Crystallographic Structure Database3 [ICSD]), are obtained in a self-consistent manner to yield directly comparable properties across structure and chemistry. (The convex hull is the compound energy that prevents decomposition into other compounds or elements in that composition space.) Second, the Materials Project takes great efforts to reduce duplication of data by ensuring the uniqueness of materials and computed properties, which enable fitted ML models to explore structure–property relations without being skewed toward specific, heavily studied structure spaces. This is becoming an increasingly important consideration as the Materials Project and similar databases4,5 now contain computational data sets ranging on the order of 103 materials6–8 (for computationally expensive property evaluations) to approximately 104–105 materials9,10 (for computationally inexpensive property evaluations). Finally, the Materials Project provides a robust Application Programming Interface11 as well as a

Weike Ye, University of California, San Diego, USA; [email protected] Chi Chen, University of California, San Diego, USA; [email protected] Shyam Dwaraknath, Lawrence Berkeley National Laboratory, USA; [email protected] Anubhav Jain, Lawrence Berkeley National Laboratory, USA; [email protected] Shyue Ping Ong, University of California, San Diego, USA; [email protected] Kristin A. Persson, University of California, Berkeley, and Lawrence Berkeley National Laboratory, USA; [email protected] *denotes equal contribution. doi:10.1557/mrs.2018.202

664

• VOLUME 43 • SEPTEMBERUniversity 2018 • www.mrs.org/bulletin Downloaded MRS fromBULLETIN https://www.cambridge.org/core. of Western Ontario, on 10 Sep 2018 at 08:29:22, subject to the Cambridge Core terms©of2018 use, Materials available Research at https://www.cambridge.org/core/terms. https://doi.org/10.1557/mrs.2018.202

Society

Harnessing the Materials Project for machine-learning and accelerated discovery

conductivity,9 solute diffusi