Tools for Aggregating, Analyzing and Mining Combinatorial Data
- PDF / 5,688,227 Bytes
- 6 Pages / 612 x 792 pts (letter) Page_size
- 31 Downloads / 203 Views
1159-G03-08
Tools for Aggregating, Analyzing, and Mining Combinatorial Data Wesley B. Jones,1 Changwon Suh,1 Peter A. Graf,1 Daniel Korytina,2 Craig Swank,1 Christopher Perkins1 1 National Renewable Energy Laboratory, 1617 Cole Blvd., Golden, CO 80401, USA 2 University of Colorado at Boulder, Boulder, CO, USA ABSTRACT We demonstrate how data mining techniques can be applied to complex combinatorial data sets and how data from multiple sources can be aggregated via the developed scientific data management system. An example is shown for the case of aggregated combinatorial data for the study of composition, processing, structure, and property relationships of transparent conducting oxides by applying data mining techniques such as principal component analysis. Data mappings of mined results are shown to effectively enable visualization of data trends, identification of anomalies in Fourier transform infrared spectroscopy patterns, and scientifically interesting libraries and spectral regions. INTRODUCTION In recent years, combinatorial materials synthesis and high-throughput experimentation have dramatically evolved to allow for a faster and more systematic search for new target materials. Material informatics has emerged to address the issues associated with larger, more complex data sets, and to take greater advantage of knowledge that might be gained from the integration of data. The multivariate data include process parameters, chemistry, crystal structure, and different physical properties collected with the aid of many analytical tools. The focus of this combinatorial approach is focused less on human intuition and more on extracting composition, processing, structure, and property relationships via systematic experimentation. When combinatorial data are analyzed, materials informatics is particularly valuable because existing physical models do not cross length scales from atomistic level to mesoscale for material properties. It serves as a unique “data probe” for exploring large, complex data sets across different length scales by exposing relationships between diverse types of data. The field of materials informatics consists of two core parts—data mining and data management. Many methodologies proposed have focused on solving the data mining challenge of analyzing large, diverse sets of data [1,2]. Efforts for the process of integrating heterogeneous scientific information for materials discovery include laboratory information management systems (LIMS) projects [3] and languages (MatDL) [4]. Data mining combinatorial data sets is a formidable task due to the lack of scientific data management systems (SDMS) to integrate heterogeneous data and to extract knowledge and insight from databases. Therefore, it is essential that integrated or well-organized data sets should serve to provide guidelines for designing query systems, which make it possible to cross database queries, exchange information, and distribute data in materials design. Web interfaces for well-organized data sets are also powerful for providi
Data Loading...