TMaR: a two-stage MapReduce scheduler for heterogeneous environments

PDF / 2,212,214 Bytes
26 Pages / 595.276 x 790.866 pts Page_size
52 Downloads / 230 Views

RESEARCH

Open Access

TMaR: a two‑stage MapReduce scheduler for heterogeneous environments Neda Maleki1* , Hamid Reza Faragardi2, Amir Masoud Rahmani3,6,7, Mauro Conti4 and Jay Lofstead5 *Correspondence: [email protected] 1 Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran Full list of author information is available at the end of the article

Abstract In the context of MapReduce task scheduling, many algorithms mainly focus on the scheduling of Reduce tasks with the assumption that scheduling of Map tasks is already done. However, in the cloud deployments of MapReduce, the input data is located on remote storage which indicates the importance of the scheduling of Map tasks as well. In this paper, we propose a two-stage Map and Reduce task scheduler for heterogeneous environments, called TMaR. TMaR schedules Map and Reduce tasks on the servers that minimize the task finish time in each stage, respectively. We employ a dynamic partition binder for Reduce tasks in the Reduce stage to lighten the shuffling traffic. Indeed, TMaR minimizes the makespan of a batch of tasks in heterogeneous environments while considering the network traffic. The simulation results demonstrate that TMaR outperforms Hadoop-stock and Hadoop-A in terms of makespan and network traffic and achieves by an average of 29%, 36%, and 14% performance using Wordcount, Sort, and Grep benchmarks. Besides, the power reduction of TMaR is up to 12%. Keywords: MapReduce, Hadoop, Heterogeneous systems, Scheduling, Performance, Shuffling, Power, Cloud computing

Introduction Today, we are surrounded by a massive amount of data which are produced by social media, web surfing, embedded sensors, IoT nodes, and so on. According to the International Data Corporation (IDC) report in 2017, the size of the world’s information is increasing and would be 140 ZB by 2050 [1]. Such a huge volume of data necessitates a substantial scaling of the resources horizontally [2] in which the massive produced data can be processed in parallel on distributed machines. One of the most popular parallel and distributed frameworks is MapReduce introduced by Google in 2004 [3]. Hadoop [4] is an open-source implementation of the MapReduce for cloud computing. Each MapReduce job consists of two dependent phases, Map and Reduce. The user-defined Map and Reduce tasks are distributed independently onto multiple resources in a treestyle network topology for parallel execution. The Shuffle phase performs an all-to-all remotely fetching of intermediate data from the Map phase to the Reduce phase. It involves intensive data communications (flows) between resources and can significantly delay job completion. Therefore, effective use of resources such as computation and © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s)

Data Loading...

TMaR: a two-stage MapReduce scheduler for heterogeneous environments

Recommend Documents

Heterogeneous IHRRN Scheduler Based on Closeness Centrality

Video Adaptation for Heterogeneous Environments

Job scheduler for streaming applications in heterogeneous distributed processing systems

Scheduler

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster

MapReduce

A multi-level AI-based scheduler to increase adaptiveness in time-constrained mobile communication environments

Drag Distribution in Idealized Heterogeneous Urban Environments

Validating Policies for Dynamic and Heterogeneous Cloud Environments

SCHEDULER FOR FMS (DYNAMIC)

A dynamic task scheduler tolerant to multiple hibernations in cloud environments