NOMAD: The FAIR concept for big data-driven materials science
- PDF / 1,332,992 Bytes
- 7 Pages / 585 x 783 pts Page_size
- 54 Downloads / 206 Views
Introduction The discovery of improved and novel—not just new— materials or unknown properties of known materials to meet specific scientific or industrial requirements is one of the most exciting and economically important applications of high-performance computing (HPC) to date. The convergence of theoretical physics and chemistry, materials science and engineering, and computer science into computational materials science enables the modeling of materials, both existing materials and those that can be created in the future, at the electronic and atomic levels. This also allows for the accurate prediction of how these materials will behave at the microscopic and macroscopic levels, and an understanding of their suitability for specific research and commercial applications. Computational high-throughput screening initiatives,*,1–5 significantly boosted by the US Materials Genome Initiative,6,7 meet this issue by computing the properties of many thousands of possible materials. When looking more closely, however, one realizes that such studies have been inefficiently and ineffectively exploited so far, as only a tiny amount of the information that is contained in all the computed data is being used. This applies to high-throughput screening as well as to individual theoretical and experimental investigations. Unfortunately, besides a few numbers, tables, or graphs that appear in resulting publications, the wealth of other information contained
in the full research work is typically disregarded or even deleted. Changing this situation and fully realizing comprehensive data sharing is slow, and characterized more by lip services from science policy and funding agencies than by real commitments and support.8 This has slowed possible progress of scientific advancements. In this context and in the area of computational materials science, the NOMAD (Novel Materials Discovery) a European Center of Excellence (CoE),9 whose computer and storage are at the Max Planck Computing and Data Facility in Garching/Germany, has assumed a pioneering role, considering all aspects of what is now called findable, accessible, interoperable, and reusable (FAIR)†,10 handling of data: Data are findable for anyone interested; they are stored in a way that makes them easily accessible;11 their representation follows accepted standards,12,13 and all specifications are open—hence data are interoperable.‡ All of this enables the data to be used for research questions that could be different from their original purpose; hence data are repurposable.§ We illustrate the latter with an example. Let’s assume a research team has investigated TiO2 for heterogeneous catalysis where TiO2 is an important support material. The results published in a research journal are typically not useful for researchers who are interested in the same material,
Claudia Draxl, Humboldt-Universität zu Berlin, and Fritz-Haber-Institut Berlin, Germany; [email protected] Matthias Scheffler, Fritz-Haber-Institut Berlin, and Humboldt-Universität zu Berlin, Germa
Data Loading...