Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain

PDF / 1,267,907 Bytes
8 Pages / 612.284 x 810.709 pts Page_size
112 Downloads / 276 Views

Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain SHI Lianxing 1 (), WANG Zhiheng 2 (),

LI Xiaoyong 1∗ ()

(1. School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; 2. School of Electronic Information Engineering, Shanghai DianJi University, Shanghai 200240, China)

© Shanghai Jiao Tong University and Springer-Verlag GmbH Germany, part of Springer Nature 2020 Abstract: The 3-replica redundancy strategy is widely used to solve the problem of data reliability in large-scale distributed storage systems. However, its storage capacity utilization is only 33%. In this paper, a data placement algorithm based on fault-tolerant domain (FTD) is proposed. Owing to the fine-grained design of the FTD, the data reliability of systems using two replicas is comparable to that of current mainstream systems using three replicas, and the capacity utilization is increased to 50%. Moreover, the proposed FTD provides a new concept for the design of distributed storage systems. Distributed storage systems can take FTDs as the units for data placement, data migration, data repair and so on. In addition, fault detection can be performed independently and concurrently within the FTDs. Key words: data reliability, failure domain, fault-tolerant domain, data placement, storage system, distributed system CLC number: TP 391 Document code: A

0 Introduction Data reliability is a critical target in the design of distributed storage systems. These systems typically comprise hundreds of standard and inexpensive storage servers with thousands of disks, which lead to frequent component failures. Research on disk failure statistics of large-scale storage systems shows that disk failure rate increases signiﬁcantly when the system continues to run for 2—3 years[1]. The Google ﬁle system (GFS) designers[2] believe that component failure is the norm rather than exception, and they deal with it by saving multiple replicas to diﬀerent servers. There are three aspects to improving data reliability: data redundancy strategy, data distribution algorithm, and data repair mechanism. Multi-replica and erasure code are two commonly used data redundancy strategies. The multi-replica strategy is supported by most popular distributed storage systems, such as GFS, HDFS[3] , CEPH[4] , and FARSITE[5] . The multi-replica strategy not only provides high read and write performance but the data repair in it also consumes less network bandwidth. However, it has a relatively low storage space utilization[6] . For example, the eﬀective utilization of storage space is only 33% with three repliReceived: 2019-11-09 Accepted: 2020-02-18 Foundation item: the Science and Technology Project of Minhang District in Shanghai (No. 2018MH331) ∗E-mail: [email protected]

cas. Currently, Reed-Solomon (RS) code[7] is the most widely used erasure code scheme. It divides raw data into k data blocks and generates m parity blocks by encoding the k data blocks. Subsequently, the whole data can be r

Data Loading...

Novel Data Placement Algorithm for Distributed Storage System Based on Fault-Tolerant Domain

Recommend Documents

Distributed Domain Division Algorithm

A Blockchain Based Distributed Storage System for Knowledge Graph Security

A Novel Repair-Based Multi-objective Algorithm for QoS-Constrained Distributed Data-Intensive Web Service Composition

Automatic Document Data Storage System Based on Machine Learning

Auto encryption algorithm for uploading data on cloud storage

Novel Photochromic Zwitterions For Multifrequency Data Storage

Chinese News Data Extraction System Based on Readability Algorithm

Energy-Efficient and Latency-Aware Data Placement for Geo-Distributed Cloud Data Centers

A QoS optimization system for complex data cross-domain request based on neural blockchain structure

Collecting Data Streams from a Distributed Radio-Based Measurement System

A Novel Phase Noise Compensation Scheme for 60GHz OFDM System Based on Quantum Genetic Algorithm

Exploring a novel fusion-scheme based on mathematical equation system for encryption-image algorithm