Improving the Map and Shuffle Phases in Hadoop MapReduce

Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proficient analogous computin

PDF / 347,976 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
103 Downloads / 230 Views

DOWNLOAD

REPORT

Abstract Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proﬁcient analogous computing programming representation, MapReduce and Hadoop are employed for extensive data analysis applications. However, MapReduce still suffers with performance problems and MapReduce uses a shuffle phase as a featured element for logical I/O strategy. The map phase requires an improvement in its performance as this phase’s output acts as an input to the next phase. Its result reveals the efﬁciency, so map phase needs some intermediate checkpoints which regularly monitor all the splits generated by intermediate phases. MapReduce model is designed in a way that there is a need to wait until all maps accomplish their given task. This acts as a barrier for effective resource utilization. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling, and increase resource utilization in a cluster. Keywords MapReduce Data analytics HDFS

Hadoop Shuffle Big data

1 Introduction The objective of data analytics is scrutinizing, cleansing, renovating, and molding of the data for extracting functional information, portentous termination and sustaining choice making [1]. Data analysis has various sides and looming methods beneath diverse identities in special business, science and social science ﬁelds [2]. Big data is a meticulous technique of data analysis that focuses on analyzing huge data sets which materialize from various ﬁelds of intensive informatics data J. V. N. Lakshmi (&) AIMS Institutes of Higher Education, Peenya, Bengaluru, Karnataka, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2018 S. C. Satapathy et al. (eds.), Smart Computing and Informatics, Smart Innovation, Systems and Technologies 77, https://doi.org/10.1007/978-981-10-5544-7_21

203

204

J. V. N. Lakshmi

centers [3]. Big data typically comprises of data sets of massive volume beyond the skill of traditional software tools to analyze, handle, and process the data [4]. Procedures written in this practical way are mechanically parallelized and implemented on an immense cluster of commodity equipment [5, 6]. In program execution, runtime structures are concerned of the splits which are scheduled in handling many operations such as implementation across set of machines, managing failures, and handling inter-machine communications [7]. The crucial drawback is exhibited on Hadoop performance affecting the cluster. The signiﬁcant explanation of Hadoop is outlined as below: (1) Distinct phases are leaped into a single task—the implementation of reduce function is CPU intensive and memory intensive as to segregate the map task data and produce the absolute outcome. (2) Arbitrary requests from I/O effecting the shuffle phase—task tracker receives plenty of I/O reading requests. Each request w

Data Loading...

Improving the Map and Shuffle Phases in Hadoop MapReduce

Recommend Documents

Duplicate Resource Detection in RDF Datasets Using Hadoop and MapReduce

Improving MapReduce Process by Mobile Agents

Potentiality for Executing Hadoop Map Tasks on GPGPU via JNI

MapReduce

Load Balancing Approach for a MapReduce Job Running on a Heterogeneous Hadoop Cluster

MapReduce Hadoop Models for Distributed Neural Network Processing of Big Data Using Cloud Services

Practical Hadoop Ecosystem A Definitive Guide to Hadoop-Related Fram

Supporting Data Shuffle Between Threads in OpenMP

Inkrementelle Neuberechnungen in MapReduce

Secret-Shared Shuffle

A Verifiable Shuffle for the GSW Cryptosystem

A Shuffle Argument Secure in the Generic Model