An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

PDF / 896,917 Bytes
30 Pages / 439.37 x 666.142 pts Page_size
34 Downloads / 160 Views

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems Abdol Karim Javanmardi1 · S. Hadi Yaghoubyan1,3 · Karamollah BagheriFard1,3 · Samad Nejatian2,3 · Hamid Parvin4,5,6 Accepted: 22 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, FirstOut (FIFO), and FAIR schedulers. Keywords Scheduling · Hadoop · High-level architecture · Minimum share · Heterogeneous clusters

* S. Hadi Yaghoubyan [email protected] Extended author information available on the last page of the article

13

Vol.:(0123456789)

A. K. Javanmardi et al.

1 Introduction Big data is a field of computer science, addressing information analysis and metadata extraction methods from data sets. A number of software have been so far developed for processing data, which can be structured, semi-structured, or unstructured [1]. In big data, the data is accompanied by the concepts of velocity, variety, volume, and veracity. Data processing also includes some rows of the records that can show a high statistical power, while data with high complexity can merely lead to an increase in false discovery rate [2]. Data capturing, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, data source, scheduling methods, etc., are correspondingly among challenges to big data. Various schedulers have been similarly designed for big data processing software including Hadoop, developed by Doug Cutting as a set of open-resource projects. Different algorithms have been additionally pres

Data Loading...

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Recommend Documents

Scheduling Parallel Applications on Heterogeneous Distributed Systems

Scheduling Heterogeneous Wireless Systems for Efficient Spectrum Access

Practical Hadoop Ecosystem A Definitive Guide to Hadoop-Related Fram

Performance Analysis of Scheduling Algorithms in Apache Hadoop

An Adversarial Model for Scheduling with Testing

High-impact minimum wages and heterogeneous regions

A 3/2-Approximation for the Proportionate Two-Machine Flow Shop Scheduling with Minimum Delays

Cloud-Based Architecture Development to Share Vehicle and Traffic Information for Industry 4.0

Using independent resource allocation strategies to solve conflicts of Hadoop distributed architecture in virtualization

Design and Analysis of an Efficient Queue Scheduling Scheme for Heterogeneous Traffics in BWA Networks

Analysis of Dynamic Scheduling Algorithm for Reconfigurable Architecture

Fair budget constrained workflow scheduling approach for heterogeneous clouds