An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique

PDF / 2,165,003 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
45 Downloads / 211 Views

An improved query optimization process in big data using ACO‑GA algorithm and HDFS map reduce technique Deepak Kumar1 · Vijay Kumar Jha1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Storing as well as retrieving the data on a specific time frame is fundamental for any application today. So an efficiently designed query permits the user to get results in the desired time and creates credibility for the corresponding application. To avoid the difficulty in query optimization, this paper proposed an improved query optimization process in big data (BD) using the ACO-GA algorithm and HDFS mapreduce. The proposed methodology consists of ‘2’ phases, namely, BD arrangement and query optimization phases. In the first phase, the input data is pre-processed by finding the hash value (HV) using the SHA-512 algorithm and the removal of repeated data using the HDFS map-reduce function. Then, features such as closed frequent pattern, support, and confidence are extracted. Next, the support and confidence are managed by using the entropy calculation. Centered on the entropy calculation, the related information is grouped by using Normalized K-Means (NKM) algorithm. In the 2nd phase, the BD queries are collected, and then the same features are extorted. Next, the optimized query is found by utilizing the ACO-GA algorithm. Finally, the similarity assessment process is performed. The experimental outcomes illustrate that the algorithm outperformed other existent algorithms. Keywords Secure Hash Algorithm (SHA-512) · Hadoop Distributed File System (HDFS) · Normalized K-Means (NKM) algorithm · Ant Colony OptimizationGenetic Algorithm (ACO-GA)

* Deepak Kumar [email protected] Vijay Kumar Jha [email protected] 1

Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, India

13

Vol.:(0123456789)

Distributed and Parallel Databases

1 Introduction The analysis of a large compilation of data is a routine action in numerous commercials along with academic organizations. Internet companies, for example, collect a massive quantity of data, say, content formed by means of service logs, web crawlers, along with click-streams [1], and some of the storage systems utilized by them are BD, cloud computing, etc. The data which are beyond the storage space of the server and also beyond the processing power is called BD [2, 3]. In this era, there is a requirement for software platforms to resolve dynamic multi-objective BD optimization problems [4]. A BD processing platform is by definition the computing platform for processing BD [5]. The present academic research together with industrial practices on data-bases emphasizes more on performance than energy efficiency [6]. It is not controllable by customary RDBMS [7, 8] or standard statistical tools. Scrutinizing these data sets might need processing tens or hundreds of terabytes of data. To perform this task, many companies rely on highly distributed software systems functioning on big clusters of commodity

Data Loading...

An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique

Recommend Documents

K-DBSCAN: An improved DBSCAN algorithm for big data

An Ideal Big Data Architectural Analysis for Medical Image Data Classification or Clustering Using the Map-Reduce Frame

Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

An Improved Discrete Particle Swarm Optimization Algorithm

SICE: an improved missing data imputation technique

Firefly algorithm: an optimization solution in big data processing for the healthcare and engineering sector

An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering

A Secured Steganography Algorithm for Hiding an Image and Data in an Image Using LSB Technique

Data clustering using multivariant optimization algorithm

Improved Weak Classifier Optimization Algorithm

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

Validation of an Improved Optimization Technique for Photovoltaic Modeling