DASM: A Dynamic Adaptive Forward Assembly Area Method to Accelerate Restore Speed for Deduplication-Based Backup Systems
Data deduplication yields an important role in modern backup systems for its demonstrated ability to improve storage efficiency. However, in deduplication-based backup systems, the consequent exhausting fragmentation problem has drawn ever-increasing atte
- PDF / 2,269,534 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 83 Downloads / 172 Views
Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China [email protected] 2 Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
Abstract. Data deduplication yields an important role in modern backup systems for its demonstrated ability to improve storage efficiency. However, in deduplication-based backup systems, the consequent exhausting fragmentation problem has drawn ever-increasing attention over in terms of backup frequencies, which leads to the degradation of restoration speed. Various Methods are proposed to address this problem. However, most of them purchase restore speed at the expense of deduplication ratio reduction, which is not efficient. In this paper, we present a Dynamic Adaptive Forward Assembly Area Method, called DASM, to accelerate restore speed for deduplicationbased backup systems. DASM exploits the fragmentation information within the restored backup streams and dynamically trades off between chunk-level cache and container-level cache. DASM is a pure data restoration module which pursues optimal read performance without sacrificing deduplication ratio. Meanwhile, DASM is a resource independent and cache efficient scheme, which works well under different memory footprint restrictions. To demonstrate the effectiveness of DASM, we conduct several experiments under various backup workloads. The results show that, DASM is sensitive to fragmentation granularity and can accurately adapt to the changes of fragmentation size. Besides, experiments also show that DASM improves the restore speed of traditional LRU and ASM methods by up to 58.9 % and 57.1 %, respectively. Keywords: Data deduplication policy · Performance evaluation
1
·
Restore speed
·
Reliability
·
Cache
Introduction
Data deduplication is an effective technique used to improve storage efficiency in modern backup systems [2,13]. A typical deduplication-based backup system partitions backup streams into variable-size or fixed-size chunks. Each data c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing AG 2016. All Rights Reserved G.R. Gao et al. (Eds.): NPC 2016, LNCS 9966, pp. 58–70, 2016. DOI: 10.1007/978-3-319-47099-3 5
DASM: A Dynamic Adaptive Forward Assembly Area Method
59
chunk is identified by fingerprints calculated using cryptographic methods, such as SHA-1 [11]. Two chunks are considered to be duplicates if they have identical fingerprints. For each chunk, deduplication system employs Key-Value Store, also referred to fingerprint index, to identify possible duplicates. Only new fresh chunks are physically stored in containers while duplicates are eliminated. However, in backup systems, the deviation between physical locality and logical locality increases when backup frequencies are improved, which leads to the physical dispersion of subsequent backup streams. The consequent exhausting fragmentation problem has drawn ever-increasing
Data Loading...