Handling Large and Complex Data in a Photovoltaic Research Institution Using a Custom Laboratory Information Management
- PDF / 6,741,221 Bytes
- 12 Pages / 612 x 792 pts (letter) Page_size
- 64 Downloads / 199 Views
Handling Large and Complex Data in a Photovoltaic Research Institution Using a Custom Laboratory Information Management System 1
Robert R. White1 and Kristin Munch1 National Renewable Energy Laboratory, 1617 Cole Blvd., Golden, CO 80401, USA
ABSTRACT Twenty-five years ago the desktop computer started becoming ubiquitous in the scientific lab. Researchers were delighted with its ability to both control instrumentation and acquire data on a single system, but they were not completely satisfied. There were often gaps in knowledge that they thought might be gained if they just had more data and they could get the data faster. Computer technology has evolved in keeping with Moore’s Law meeting those desires; however those improvements have of late become both a boon and bane for researchers. Computers are now capable of producing high speed data streams containing terabytes of information; capabilities that evolved faster than envisioned last century. Software to handle large scientific data sets has not kept up. How much information might be lost through accidental mismanagement or how many discoveries are missed through data overload are now vital questions. An important new task in most scientific disciplines involves developing methods to address those issues and to create the software that can handle large data sets with an eye towards scalability. This software must create archived, indexed, and searchable data from heterogeneous instrumentation for the implementation of a strong data-driven materials development strategy. At the National Center for Photovoltaics in the National Renewable Energy Laboratory, we began development a few years ago on a Laboratory Information Management System (LIMS) designed to handle lab-wide scientific data acquisition, management, processing and mining needs for physics and materials science data, and with a specific focus towards future scalability for new equipment or research focuses. We will present the decisions, processes, and problems we went through while building our LIMS system for materials research, its current operational state and our steps for future development. INTRODUCTION The scope and capabilities of computers to support scientific research and experimentation has grown enormously in the past fifty years. These systems have now become integrated tightly within most experimental systems from controlling the instrument operations to data acquisition and in some cases analysis. While in general this has improved greatly our abilities to utilize computers to meet our needs in research; it has created a deluge of data caused by large scale resolution imagery, high resolution temporal data, extremely large or high dimension data sets, and advanced simulations and modeling. Data streams can now deliver hundreds of megabytes if not gigabytes of data very quickly, but our ability to effectively store and process the data has not kept up. This is not only an issue in photovoltaics but has touched across many fields in the sciences [1-4]. In addition distributed research
Data Loading...