Predicting the performance of big data applications on the cloud

PDF / 1,820,989 Bytes
33 Pages / 439.37 x 666.142 pts Page_size
55 Downloads / 222 Views

Predicting the performance of big data applications on the cloud D. Ardagna1 · E. Barbierato1 · E. Gianniti1 · M. Gribaudo1 · T. B. M. Pinto2 · A. P. C. da Silva2 · J. M. Almeida2

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7% relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction. Keywords Performance prediction · Apache spark · Parallel computing · Data science · Big data · Analytical and simulation models

* A. P. C. da Silva [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)

D. Ardagna et al.

1 Introduction Data science has become widespread as a means to extract knowledge and insights from many structured and unstructured data and turn them into business actions. The growing interest of enterprises towards data-intensive business strategies required the development of novel applications and the adoption of advanced technologies based on big data and machine learning. Such applications have moved from experimental setups to enterprise-wide deployments bringing innovation and competitive advantage to many businesses [1]. Indeed, it has been reported that the data science platform market size has reached USD 37.9 billion in 2019 and it is expected to grow to USD 140.9 billion by 2024, at a compound annual growth rate o

Data Loading...

Predicting the performance of big data applications on the cloud

Recommend Documents

Big Data, Cloud and Applications Third International Conference, BDC

Cloud Networking for Big Data

Big Data and Cloud Computing

Systematic Mapping Study on Performance Scalability in Big Data on Cloud Using VM and Container

Big Data, Databases and "Ownership" Rights in the Cloud

Big Data Service of Financial Law Based on Cloud Computing

Go Big with the Cloud

Study on the Network Information Security Problems Under the Environment of Big Data Cloud Computing

Advances in Big Data and Cloud Computing

Guide to Big Data Applications

Big Data Technologies and Applications

Big Data Analysis and Deep Learning Applications Proceedings of the