Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPH

The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, e

PDF / 553,397 Bytes
12 Pages / 439.37 x 666.142 pts Page_size
15 Downloads / 171 Views

DOWNLOAD

REPORT

Abstract The efﬁciency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, enabling the provider to be competitive in the market. Heterogeneity of nodes, and the need for frequent expansion and reconﬁguration of the subsystems fostered the development of efﬁcient approaches that replace traditional data replication, by exploiting more advanced techniques, such the ones that leverage erasure codes. In this paper we use an ad-hoc discrete event simulation approach to study the performances of replication and erasure coding with different parametric conﬁgurations, aiming at the minimization of overheads while obtaining the desired reliability. The approach is demonstrated with a practical application to the erasure coding plugins of the increasingly popular CEPH distributed ﬁle system.

Keywords Performance modeling Cloud computing and big data infrastructures Storage systems Erasure codes CEPH

D. Manini Dip. Di Informatica, Università Di Torino, Corso Svizzera, 185, 10129 Torino, Italy e-mail: [email protected] M. Gribaudo Dip. Di Elettronica, Informazione E Bioingegneria, Politecnico Di Milano, Via Ponzio 34/5, 20133 Milan, Italy e-mail: [email protected] M. Iacono (&) Dip. Di Scienze Politiche, Seconda Università Degli Studi Di Napoli, Viale Ellittico 31, 81100 Caserta, Italy e-mail: [email protected] © Springer International Publishing Switzerland 2016 L. Caporarello et al. (eds.), Digitally Supported Innovation, Lecture Notes in Information Systems and Organisation 18, DOI 10.1007/978-3-319-40265-9_20

273

274

D. Manini et al.

1 Introduction The management of huge computing infrastructures, typical of the cloud computing oriented market, is a challenge that a provider has to face in order to keep the pace with competitors. Besides the technical factors, costs are the main leverage on which providers have to found their strategies. Efﬁciency in using expensive resources, such as energy, computation and storage, is an effective way to balance costs and revenues while providing affordable services with sufﬁcient quality. The complexity of such infrastructures requires a higher management effort, but paves the way to more sophisticated solutions to pursue efﬁciency. The authors already investigated the main aspects of massively distributed architectures for data centers in [2–8, 12]. In this paper, that extends the results that can be found in [12] and apply them to an emerging technology for storage in datacenters, we present a simulative approach for the evaluation of erasure coding based approaches for space and performance efﬁcient data resilience solutions. Our approach uses user deﬁned storage entity grouping blocks across different nodes to improve system reliability, by exploiting erasure codes to deﬁne and implement data redundancy while using a low space and computing overhead. With respect to

Data Loading...

Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPH

Recommend Documents

Large-Scale Object-Based Multimedia Storage Systems

Lekana - Blockchain Based Archive Storage for Large-Scale Cloud Systems

Data-Driven Fault Detection in Large-Scale and Distributed Systems

Large-Scale and Distributed Optimization

Intelligent Decision Systems in Large-Scale Distributed Environments

Energy Efficiency in Large Scale Distributed Systems COST IC0804 Eur

Distributed Storage Systems

Effective Local Reconstruction Codes Based on Regeneration for Large-Scale Storage Systems

Agent-Based Modeling, Large-Scale Simulations

Replication in Distributed Systems: Models, Methods, and Protocols

Efficient Modeling and Control of Large-Scale Systems

Large-scale systems