Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

PDF / 968,655 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
95 Downloads / 389 Views

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems Tareq Abed Mohammed1,2 · Oguz Bayat1 · Osman N. Uçan1 · Shaymaa Alhayali1

© Springer Nature B.V. 2019

Abstract Due to the huge amount of data being generating from different sources, the analyzing and extracting of useful information from these data becomes a very complex task. The difficulty of dealing with big data optimization problems comes from many factors such as the high number of features, and the existing of lost data. The feature selection process becomes an important step in many data mining and machine learning algorithms to reduce the dimensionality of the optimization problems and increase the performance of the classification or clustering algorithms. In this paper, a set of hybrid and efficient genetic algorithms are proposed to solve feature selection problem, when the handled data has a large feature size. The proposed algorithms use a new gene-weighted mechanism that can adaptively classify the features into strong relative features, weak or redundant features, and unstable features during the evolution of the algorithm. Based on this classification, the proposed algorithm gives the strong features high priority and the weak features less priority when generating new candidate solutions. In the same time, the proposed algorithm tries to more concentrate on unstable features that sometimes appear and sometimes disappear from the best solutions of the population. The performance of proposed algorithms is investigated by using different datasets and feature selection algorithms. The results show that our proposed algorithms can outperform the other feature selection algorithms and effectively enhance the classification performance over the tested datasets. Keywords Feature selection · Evolutionary algorithms · Big data analyzing · Artificial neural networks

* Oguz Bayat [email protected] Tareq Abed Mohammed [email protected] Osman N. Uçan [email protected] Shaymaa Alhayali [email protected] 1

Altinbas University College of Engineering, Istanbul, Turkey

2

Kirkuk University College of Science, Kirkuk, Iraq

13

Vol.:(0123456789)

T. A. Mohammed et al.

1 Introduction In recent years, the major increase in the amount of generated data makes it very important to develop new robust and scalable tools that are able to extract the hidden knowledge and information from the big data sets (John Walker 2014). When the dataset that we are dealing with has a massive volume of data and includes both structured and unstructured data, it is called a big data (Manyika et al. 2011; Zikopoulos and Eaton 2011). The big data becomes a specific and separated field in computer engineering society since it is difficult to be processed using the traditional database and software techniques. Big data has other different specific properties such as, the velocity which refers to the speed at which data is being generated, the variety which means the existence of structured and unstructured

Data Loading...

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Recommend Documents

Feature Selection Optimization Using a Hybrid Genetic Algorithm

A parallel hybrid krill herd algorithm for feature selection

A hybrid feature selection scheme for mixed attributes data

A new ensemble feature selection approach based on genetic algorithm

Evolutionary Hybrid Feature Selection for Cancer Diagnosis

Genetic Algorithm Selection for Ship Concept Design

Automatic Algorithm Selection for Complex Simulation Problems

Feature Selection for Data and Pattern Recognition

Hybrid Ranking and Regression for Algorithm Selection

Automatic Feature Selection by Genetic Algorithms

A Surrogate-Assisted Evolutionary Algorithm with Random Feature Selection for Large-Scale Expensive Problems

A Robust Method for Multi-algorithmic Palmprint Recognition Using Exponential Genetic Algorithm-Based Feature Selection