Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPH
The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, e
- PDF / 553,397 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 15 Downloads / 164 Views
Abstract The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, enabling the provider to be competitive in the market. Heterogeneity of nodes, and the need for frequent expansion and reconfiguration of the subsystems fostered the development of efficient approaches that replace traditional data replication, by exploiting more advanced techniques, such the ones that leverage erasure codes. In this paper we use an ad-hoc discrete event simulation approach to study the performances of replication and erasure coding with different parametric configurations, aiming at the minimization of overheads while obtaining the desired reliability. The approach is demonstrated with a practical application to the erasure coding plugins of the increasingly popular CEPH distributed file system.
Keywords Performance modeling Cloud computing and big data infrastructures Storage systems Erasure codes CEPH
D. Manini Dip. Di Informatica, Università Di Torino, Corso Svizzera, 185, 10129 Torino, Italy e-mail: [email protected] M. Gribaudo Dip. Di Elettronica, Informazione E Bioingegneria, Politecnico Di Milano, Via Ponzio 34/5, 20133 Milan, Italy e-mail: [email protected] M. Iacono (&) Dip. Di Scienze Politiche, Seconda Università Degli Studi Di Napoli, Viale Ellittico 31, 81100 Caserta, Italy e-mail: [email protected] © Springer International Publishing Switzerland 2016 L. Caporarello et al. (eds.), Digitally Supported Innovation, Lecture Notes in Information Systems and Organisation 18, DOI 10.1007/978-3-319-40265-9_20
273
274
D. Manini et al.
1 Introduction The management of huge computing infrastructures, typical of the cloud computing oriented market, is a challenge that a provider has to face in order to keep the pace with competitors. Besides the technical factors, costs are the main leverage on which providers have to found their strategies. Efficiency in using expensive resources, such as energy, computation and storage, is an effective way to balance costs and revenues while providing affordable services with sufficient quality. The complexity of such infrastructures requires a higher management effort, but paves the way to more sophisticated solutions to pursue efficiency. The authors already investigated the main aspects of massively distributed architectures for data centers in [2–8, 12]. In this paper, that extends the results that can be found in [12] and apply them to an emerging technology for storage in datacenters, we present a simulative approach for the evaluation of erasure coding based approaches for space and performance efficient data resilience solutions. Our approach uses user defined storage entity grouping blocks across different nodes to improve system reliability, by exploiting erasure codes to define and implement data redundancy while using a low space and computing overhead. With respect to
Data Loading...