An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems
- PDF / 896,917 Bytes
- 30 Pages / 439.37 x 666.142 pts Page_size
- 34 Downloads / 135 Views
An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems Abdol Karim Javanmardi1 · S. Hadi Yaghoubyan1,3 · Karamollah BagheriFard1,3 · Samad Nejatian2,3 · Hamid Parvin4,5,6 Accepted: 22 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, FirstOut (FIFO), and FAIR schedulers. Keywords Scheduling · Hadoop · High-level architecture · Minimum share · Heterogeneous clusters
* S. Hadi Yaghoubyan [email protected] Extended author information available on the last page of the article
13
Vol.:(0123456789)
A. K. Javanmardi et al.
1 Introduction Big data is a field of computer science, addressing information analysis and metadata extraction methods from data sets. A number of software have been so far developed for processing data, which can be structured, semi-structured, or unstructured [1]. In big data, the data is accompanied by the concepts of velocity, variety, volume, and veracity. Data processing also includes some rows of the records that can show a high statistical power, while data with high complexity can merely lead to an increase in false discovery rate [2]. Data capturing, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, data source, scheduling methods, etc., are correspondingly among challenges to big data. Various schedulers have been similarly designed for big data processing software including Hadoop, developed by Doug Cutting as a set of open-resource projects. Different algorithms have been additionally pres
Data Loading...