Hierarchical Data Deduplication Technology Based on Bloom Filter Array
In recent years, the data deduplication technology has become a research hotspot. In order to reduce the time and storage space requirements of deduplication technology, we propose a hierarchical deduplication approach which is based on file-level and blo
- PDF / 344,674 Bytes
- 8 Pages / 439.37 x 666.142 pts Page_size
- 48 Downloads / 291 Views
Hierarchical Data Deduplication Technology Based on Bloom Filter Array Jian Zhang, Shujuan Zhang, Yilin Lu, Xingyu Zhang and Shaochun Wu
Abstract In recent years, the data deduplication technology has become a research hotspot. In order to reduce the time and storage space requirements of deduplication technology, we propose a hierarchical deduplication approach which is based on file-level and block-level to eliminate redundant data, and introduce bloom filter (BF) to leach fingerprint to accelerate the search process. In order to further reduce the false positive rate of BF, the concept of bloom filter array (BFA) is applied. The performance results show that this strategy can effectively alleviate the pressure of storage and network transmission, raise the rate of data to be deleted and ensure higher data deduplication speed. Keywords Hierarchy
BFA Deduplication technology Backup
J. Zhang (&) S. Zhang Y. Lu X. Zhang S. Wu School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China e-mail: [email protected] S. Zhang e-mail: [email protected] Y. Lu e-mail: [email protected] X. Zhang e-mail: [email protected] S. Wu e-mail: [email protected]
Z. Zhong (ed.), Proceedings of the International Conference on Information Engineering and Applications (IEA) 2012, Lecture Notes in Electrical Engineering 216, DOI: 10.1007/978-1-4471-4856-2_88, Ó Springer-Verlag London 2013
725
726
J. Zhang et al.
88.1 Introduction In recent years, with the dramatic growth of Internet applications, data storage requirements increase substantially. Many industries provide the storage capacity from dozens of GB to hundreds of TB, even number of PB, so the backup system faces severe challenges [1]. The study found that up to 60 % of the data stored in the backup system is redundant; data deduplication in backup system has become a hot research topic. On the one hand, we use the technology to eliminate same file or block in the backup system to optimize the utilization of storage space; On the other hand we can reduce the amount of data in the network transmission, thereby reducing the energy consumption, network costs and saving network bandwidth for data replication [2]. The analysts believe that deduplication storage industry is one of the most important emerging technologies which will rewrite the economic rules of the storage industry. This paper describes a hierarchical architecture based on bloom filter array (BFA) for data deduplication in backup system. In this architecture, we can eliminate data redundancy at file-level and chunk-level, and check for duplicate chunks by the bloom filter (BF). Thus the process of identifying duplicate data will be accelerated noticeably. Then we apply the concept of BFA to reduce the false positive rate of BF. The remainder of the paper is organized as follows. In Sect. 88.2, we describe the related research. Section 88.3 briefly presents the new framework of hierarchical eliminates the redundant in the backup system based on BFA. Section 88.4 pr
Data Loading...