Big data are shaping the future of materials science

  • PDF / 313,542 Bytes
  • 2 Pages / 585 x 783 pts Page_size
  • 77 Downloads / 227 Views

DOWNLOAD

REPORT


Big data are shaping the future of materials science Ashley A. White

B

ig data mean different things to different people. In commerce, retailers extract trends from millions of consumer purchases to target advertising and increase profits. In health, Google and the US Centers for Disease Control and Prevention analyze vast amounts of search data to identify and curb potential flu outbreaks. In biology, genomic scientists use the human genome map to develop disease treatments. And in materials science, advances in data analysis have placed the field on the verge of a revolution in how researchers conduct their work, analyze properties and trends in their data, and even discover new materials. The size of data sets that once would have boggled the mind is now almost commonplace. The entire book collection of the US Library of Congress could be stored in 15 terabytes, paling in comparison to the 1.2 zettabytes (1.2 billion terabytes) of data created by humankind in 2010. But big data are about much more than size. Data scientists may disagree on the exact definition of big data, but they discuss data-related issues in terms of V’s. The size, or volume, of the data is one component, but equally important are variety (the degree of complexity or heterogeneity in the data) and velocity (the speed of data access and processing). IBM recently coined a fourth V, veracity (inherent trustworthiness of the data), while others include viability or value among their V’s. In any field, data sets are considered “big” when they are large, complex, and difficult to process and analyze. Materials science data tend to be particularly heterogeneous in terms of their type and source compared with data encountered

Ashley A. White, [email protected]

594

MRS BULLETIN



VOLUME 38 • AUGUST 2013



in other fields. “We can easily generate large sets of data, whether it’s from experiments like those performed using the Advanced Photon Source here at Argonne or from large simulations,” said Olle Heinonen of the Materials Science Division at Argonne National Laboratory in Illinois. “What matters more are your capabilities for processing that data and ultimately getting something useful out of it.” One of the first steps in processing large data sets is data reduction. Experiments at the Large Hadron Collider, for example, retain only a small fraction of 1% of the data they produce. Storing and analyzing any more than the hundreds of megabytes per second deemed most valuable become impractical with current technologies. It is up to sophisticated software to determine which data are most relevant. The Spallation Neutron Source at Oak Ridge National Laboratory (ORNL) in Tennessee, a user facility that carries out hundreds of materials science experiments each year, is capable of creating hundreds of gigabytes of data in a single experiment. This rate is beyond what a materials scientist can effectively analyze with typical technologies. In addition to reducing these data to something manageable, fast and easy data access (the “velo