Architecting Scientific Data Systems in the Cloud

Scientists, educators, decision makers, students, and many others utilize scientific data produced by science instruments. They study our universe, make new discoveries in areas such as weather forecasting and cancer research, and shape policy decisions t

  • PDF / 520,850 Bytes
  • 22 Pages / 439.37 x 666.14 pts Page_size
  • 111 Downloads / 191 Views

DOWNLOAD

REPORT


Architecting Scientific Data Systems in the Cloud Daniel Crichton, Chris A. Mattmann, Luca Cinquini, Emily Law, George Chang, Sean Hardman, and Khawaja Shams

Abstract Scientists, educators, decision makers, students, and many others utilize scientific data produced by science instruments. They study our universe, make new discoveries in areas such as weather forecasting and cancer research, and shape policy decisions that impact nations fiscally, socially, economically, and in many other ways. Over the past 20 years or so, the data produced by these scientific instruments have increased in volume, complexity, and resolution, causing traditional computing infrastructures to have difficulties in scaling up to deal with them. This reality has led us, and others, to investigate the applicability of cloud computing to address the scalability challenges. NASA’s Jet Propulsion Laboratory (JPL) is at the forefront of transitioning its science applications to the cloud environment. Through the Apache Object Oriented Data Technology (OODT) framework, for NASA’s first software released at the open-source Apache Software Foundation (ASF), engineers at JPL have been able to scale the storage and computational aspects of their scientific data systems to the cloud – thus achieving reduced costs and improved performance. In this chapter, we report on the use of Apache OODT for cloud computing, citing several examples in a number of scientific domains. Experience, specific performance, and numbers are also reported. Directions for future work in the area are also suggested.

D. Crichton (*) • L. Cinquini • E. Law • G. Chang • S. Hardman • K. Shams Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected] C.A. Mattmann Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA e-mail: [email protected] Z. Mahmood (ed.), Cloud Computing: Methods and Practical Approaches, Computer Communications and Networks, DOI 10.1007/978-1-4471-5107-4_2, © Springer-Verlag London 2013

25

26

D. Crichton et al.

Keywords Cloud computing • Distributed computing • Data-intensive systems • Scientific data systems • Object-oriented computing • e-Science

2.1

Introduction

Cloud computing promises substantial, on-demand computing and storage capabilities, scalable by design and by agreed-upon levels of service. Within science, scalability is an important need since the instruments capable of producing massive amounts of data have emerged. These instruments capture a variety of observations generating copious amounts of data, be it remote observations of Mars, in situ ground sensors measuring greenhouse gases, or information related to feedback loops for oil and gas drilling areas. These measurements and the instruments that take them require highly scalable data-