Mixed Mode Analytics Architecture for Data Deduplication in Wireless Personal Cloud Computing

  • PDF / 1,410,902 Bytes
  • 16 Pages / 439.37 x 666.142 pts Page_size
  • 68 Downloads / 189 Views

DOWNLOAD

REPORT


Mixed Mode Analytics Architecture for Data Deduplication in Wireless Personal Cloud Computing C. Vijesh Joe1 · Jennifer S. Raj2 · S. Smys3 Accepted: 29 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Large amount of data is generated each second around the world. Along with this technology is evolving each day to handle and store such enormous data in an efficient way. Even with the leap in technology, providing storage space for the gigantic data generated per second globally poses a conundrum. One of the main problems, which the storage server or data center are facing is data redundancy. Same data or slightly modified data is stored in the same storage server repeatedly. Thus, the same data is occupying more space due to its multiple copies. To overcome this issue, Data deduplication can be done. Existing deduplication techniques lacks in identifying the data in an efficient way. In our proposed method we are using a mixed mode analytical architecture to address the data deduplication. For that, three level of mapping is introduced. Each level deals with various aspects of the data and the operations carried out to get unique sets of data into the cloud server. The point of focus is effectively solving deduplication to rule out the duplicated data and optimizing the data storage in the cloud server. Keywords  Encryption · Data deduplication · Hashing · Metadata manager · Cloud sparsing algorithm · Block-level dedup · Fingerprint

1 Introduction In this Modern era, everything is digitalized. Every data is stored in the cloud server and useful information is retrieved from them thereby enabling the evolution into the next level. The data which was consumed globally in 2008 is approximately 200 tera bytes but now * C. Vijesh Joe [email protected] Jennifer S. Raj [email protected] S. Smys [email protected] 1

Department of CSE, VV College of Engineering, Tisaiyanvilai, India

2

Department of ECE, Gnanamani College of Technology, Pachal, India

3

Department of CSE, RVS Technical Campus, Kannampalayam, India



13

Vol.:(0123456789)



C. V. Joe et al.

in 2018 it has surpassed 5000 peta bytes. Over the next 2 years, this number is further expected to grow exponentially by manifolds. One of the major reasons for inefficient data storage is the presence of replicated data in the internet. To overcome this challenge, we need to focus on eradication of duplicated files. The Cloud service supports the users in storing their data via centralized zones. Hence eliminating duplication in cloud primarily paves an efficient way to store and handle the data. But the challenges which we facing in the Cloud services are rapid increase in the volume of data, confidentiality, privacy. Also, since the data is encrypted for storage purpose it is hard and time consuming to analyze the file. Cloud Computing helps to provide a virtualized storage of information around the digital world. Many multinational corporations (MNC’s) invested in cloud technologies as cloud service providers