New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships

  • PDF / 951,961 Bytes
  • 18 Pages / 584.957 x 782.986 pts Page_size
  • 42 Downloads / 266 Views

DOWNLOAD

REPORT


Geoffroy Hautier Institute of Condensed Matter and Nanosciences (IMCN), Université catholique de Louvain, 1348 Louvain-laNeuve, Belgium

Shyue Ping Ong Department of NanoEngineering, University of California San Diego, La Jolla, California 92093, USA

Kristin Persson Energy and Environmental Technologies Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; and Materials Science and Engineering, University of California Berkeley, Berkeley, California 94720, USA (Received 15 May 2015; accepted 17 February 2016)

Data mining has revolutionized sectors as diverse as pharmaceutical drug discovery, finance, medicine, and marketing, and has the potential to similarly advance materials science. In this paper, we describe advances in simulation-based materials databases, open-source software tools, and machine learning algorithms that are converging to create new opportunities for materials informatics. We discuss the data mining techniques of exploratory data analysis, clustering, linear models, kernel ridge regression, tree-based regression, and recommendation engines. We present these techniques in the context of several materials application areas, including compound prediction, Li-ion battery design, piezoelectric materials, photocatalysts, and thermoelectric materials. Finally, we demonstrate how new data and tools are making it easier and more accessible than ever to perform data mining through a new analysis that learns trends in the valence and conduction band character of compounds in the Materials Project database using data on over 2500 compounds.

Materials science has traditionally been driven by scientific intuition followed by experimental study. In recent years, theory and computation have provided a secondary avenue for materials property prediction and design. Several successful examples of materials designed in a computer and then realized in the laboratory1 have now established such methods as a new route for materials discovery and optimization. As computational methods approach maturity, new and complementary techniques based on statistical analysis and machine learning are poised to revolutionize materials science. While the modern use of the term materials informatics dates back only a decade ago,2 the use of an informatics approach to chemistry and materials science is as old as the periodic table. When Mendeleev grouped together elements by their properties, the electron was yet to be discovered, and the principles of electron Contributing Editor: Susan B. Sinnott a) Address all correspondence to this author. e-mail: [email protected] DOI: 10.1557/jmr.2016.80

configuration and quantum mechanics that underpin chemistry were still many decades away. However, Mendeleev’s approach not only resulted in a useful classification but could also make predictions: missing positions in the periodic table indicated potential new elements that were later confirmed experimentally. Mendeleev was also able to spot inaccuracies in atomic weight data of the time. Today, the search for patterns i