A Performance Analysis of High-Level MapReduce Query Languages in Big Data

The current era is an era of big data analytics. One of the challenges of big data is mining of the relevant data out of huge volume of databases where the data is present in variety of formats. MapReduce is providing a viable solution to analyze this typ

PDF / 188,402 Bytes
8 Pages / 439.37 x 666.142 pts Page_size
35 Downloads / 206 Views

DOWNLOAD

REPORT

Abstract The current era is an era of big data analytics. One of the challenges of big data is mining of the relevant data out of huge volume of databases where the data is present in variety of formats. MapReduce is providing a viable solution to analyze this type of data, but it has some limitations and weaknesses too. Hence, the high-level query languages have evolved for querying massive amount of data over MapReduce. In this research paper, the authors have analyzed the performance of the three prominent high-level query languages viz. Pig Latin, HiveQL, and JAQL based on the query processing time. We have ﬁrst stored data in the Hadoop distributed ﬁle system, processed the data for wordcount, and web log processing benchmarks and then analyzed it. An experimental analysis of the three languages has been performed on unstructured data format by doubling the size of the dataset. Keywords High-level query languages

Pig Hive JAQL Hadoop Big data

1 Introduction The current era is an age of digital revolution. The emerging trend toward the digital services and technology is to digitize every minute information. With the growth of the internet, global communication, and networking has increased. As a result, the need of storage, transmission, and accessing this information or data has become very signiﬁcant. Over the past few years, there has been tremendous increase in the volume of data. This has given rise to the term big data. Big data has been widely used to describe about the exponential growth of the data with respect to variety, volume and velocity and thus has become one of the major areas of research and analytics Namrata Singh (&) Sanjay Agrawal Department of Computer Engineering and Applications, National Institute of Technical Teachers’ Training and Research, Bhopal, India e-mail: [email protected] Sanjay Agrawal e-mail: [email protected] © Springer Science+Business Media Singapore 2016 S.C. Satapathy et al. (eds.), Proceedings of the International Congress on Information and Communication Technology, Advances in Intelligent Systems and Computing 438, DOI 10.1007/978-981-10-0767-5_57

551

552

Namrata Singh and Sanjay Agrawal

now-a-days. The key contributors to the growth of this data are the internet, social media, sensors, smart phones, etc. This data needs to be stored and processed. The traditional storage and processing mechanisms like the relational database management systems have failed to process this large amount of data. This big data problem is now being handled by various technologies like NoSQL databases [1], Hadoop [2], etc. These technologies provide an effective platform for dealing with the enormous amount of data, which needs to be effectively gathered, processed, and analyzed. Among them, Hadoop is one of the technologies which can be used to deal with various types of data. Since the data is originating from various domains, analytics has become a great challenge for big data. This data is very valuable and acts as a crucial component in analysis as the

Data Loading...

A Performance Analysis of High-Level MapReduce Query Languages in Big Data

Recommend Documents

Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

Big Data Clustering Using MapReduce Framework: A Review

Constraint Query Languages

Acquisitional Query Languages

Expressive Power of Query Languages

Analysis of Diabetes and Heart Disease in Big Data Using MapReduce Framework

Constraint Query Languages

Web Query Languages

Semi-Structured Query Languages

Biological Query Languages

Semantic Web Query Languages

Ontology Query Languages