Hierarchical data replication strategy to improve performance in cloud computing

  • PDF / 1,406,343 Bytes
  • 17 Pages / 612.284 x 802.205 pts Page_size
  • 14 Downloads / 258 Views

DOWNLOAD

REPORT


Hierarchical data replication strategy to improve performance in cloud computing Najme MANSOURI

1,2

, Mohammad Masoud JAVIDI1,2, Behnam Mohammad Hasani ZADE1,2

1 Department of Computer Science, Shahid Bahonar University of Kerman, Kerman 76169-14111, Iran 2 Mahani Mathematical Research Center, Shahid Bahonar University of Kerman, Kerman 76169-14111, Iran c Higher Education Press 2020 

Abstract Cloud computing environment is getting more interesting as a new trend of data management. Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud. However, due to the finite storage capacity of each site, copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones. Therefore, it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service (QoS) requirements and storage capacity constraints. In this paper, we present a dynamic replication algorithm, named hierarchical data replication strategy (HDRS). HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate, the replica placement according to the access load and labeling technique, and finally the replica replacement based on the value of file in the future. We evaluate different dynamic data replication methods using CloudSim simulation. Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms. It means that the HDRS can determine a popular file and replicates it to the best site. This method avoids useless replications and decreases access latency by balancing the load of sites. Keywords cloud computing, data replication, multi-tier architecture, simulation, load balance

1

Introduction

Nowadays, cloud computing environment is a common computing model, which is a significant phase in the expansion of an increasing number of distributed applications [1–3]. Figure 1 shows the features of next-generation scientific study, the e-Science needs imposed by different characteristics, and main enabling technologies. Figure 1 presents that web services, workflow, semantic web, Grid computing, and Cloud computing (e.g., SaaS, PaaS, and IaaS) are some of the major enabling digital facilities for presenting useful e-Infrastructure and application-oriented platforms [4]. With increasing the requirements of users, the size of data files, and request to obtain knowledge from the huge amount of Received March 20, 2019; accepted November 6, 2019 E-mail: [email protected]

Fig. 1 A summary of e-Science requirements and important enabling technologies [4]

data, the utility of plain storage elements that are easy to control at a large scale increases. Data is a key element in the Cloud system, and so a suitable, well maintained, and efficient platform must be provided to guarantee the dependability and availability of data. Cassandra and Hive in Facebook, and HBase in Stream are the most popular examples of