Provisioning Input and Output Data Rates in Data Processing Frameworks
- PDF / 2,686,792 Bytes
- 16 Pages / 547.044 x 736.903 pts Page_size
- 26 Downloads / 238 Views
Provisioning Input and Output Data Rates in Data Processing Frameworks Nam H. Do · Tien Van Do · L´or´ant Farkas · Csaba Rotter
Received: 20 June 2018 / Accepted: 1 January 2020 © The Author(s) 2020
Abstract This paper is motivated by the need of deadline-bounded applications in live mobile network environments to obtain the guarantee and the appropriate share of an input and output (I/O) data rate. However, data processing frameworks only support the request of memory and the computing capacity at present. In this paper, we propose a solution that allows the control of disk I/O and network I/O for data processing applications in YARN and Mesos frameworks. Experimental results show that our tool can provision the I/O data rate sharing of competing data processing applications. Keywords I/O enforcement · I/O data rate control · Cluster resource management · Hadoop YARN · Apache Mesos · HDFS
T. V. Do () Baoji University of Arts and Sciences, Shaanxi, China e-mail: [email protected] N. H. Do · T. V. Do Department of Networked Systems and Services, Budapest University of Technology and Economics, Magyar tud´osok k¨or´utja 2., Budapest, Hungary L. Farkas · C. Rotter Nokia Bell Labs Hungary, B´okay J´anos utca 36 - 42, Budapest, Hungary
1 Introduction and Motivation When a specific application submits a job, a data processing framework such as Apache Hadoop [1, 26], Hadoop YARN [25], Mesos [2], reserves and allocates necessary computing resources for the execution of the job. Many of the resource models include the amount of memory, the number of virtual CPU cores, but do not contain information on the disk I/O capability of the commodity servers and the network I/O rate. Since a computing cluster can be built up of heterogeneous hardware and software components, the I/O data rate perceived by applications is unpredictable due to the contention for the resources of the physical servers, and there is a need to monitor applications running on these platforms as well [12]. As shown in [11, 24], the I/O contention of applications leads to the degradation of quality of service. Nowadays telecommunication operators often apply frameworks to regularly process big data sets with specific deadlines in their computing clusters. Therefore, the I/O data rate guarantee and the control of sharing the available data rate for applications are critical issues in the environment of telecommunication operators. When applications compete for a resource in the hardware and the network level, which is hidden from programmers (and therefore from applications), they may suffer the I/O performance degradation.
N.H. Do et al.
Motivated by the need, we design a complete solution that can be applied to provision the I/O data rate of applications in both the Mesos and YARN frameworks. Note that this is a result1 that has gradually been improved over the years based on our previous experiences [10, 11, 24]. We demonstrate that the proposed functionalities can be integrated into two popular data processing frameworks such as Mesos and YARN to cont
Data Loading...