Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems
- PDF / 968,655 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 95 Downloads / 232 Views
Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems Tareq Abed Mohammed1,2 · Oguz Bayat1 · Osman N. Uçan1 · Shaymaa Alhayali1
© Springer Nature B.V. 2019
Abstract Due to the huge amount of data being generating from different sources, the analyzing and extracting of useful information from these data becomes a very complex task. The difficulty of dealing with big data optimization problems comes from many factors such as the high number of features, and the existing of lost data. The feature selection process becomes an important step in many data mining and machine learning algorithms to reduce the dimensionality of the optimization problems and increase the performance of the classification or clustering algorithms. In this paper, a set of hybrid and efficient genetic algorithms are proposed to solve feature selection problem, when the handled data has a large feature size. The proposed algorithms use a new gene-weighted mechanism that can adaptively classify the features into strong relative features, weak or redundant features, and unstable features during the evolution of the algorithm. Based on this classification, the proposed algorithm gives the strong features high priority and the weak features less priority when generating new candidate solutions. In the same time, the proposed algorithm tries to more concentrate on unstable features that sometimes appear and sometimes disappear from the best solutions of the population. The performance of proposed algorithms is investigated by using different datasets and feature selection algorithms. The results show that our proposed algorithms can outperform the other feature selection algorithms and effectively enhance the classification performance over the tested datasets. Keywords Feature selection · Evolutionary algorithms · Big data analyzing · Artificial neural networks
* Oguz Bayat [email protected] Tareq Abed Mohammed [email protected] Osman N. Uçan [email protected] Shaymaa Alhayali [email protected] 1
Altinbas University College of Engineering, Istanbul, Turkey
2
Kirkuk University College of Science, Kirkuk, Iraq
13
Vol.:(0123456789)
T. A. Mohammed et al.
1 Introduction In recent years, the major increase in the amount of generated data makes it very important to develop new robust and scalable tools that are able to extract the hidden knowledge and information from the big data sets (John Walker 2014). When the dataset that we are dealing with has a massive volume of data and includes both structured and unstructured data, it is called a big data (Manyika et al. 2011; Zikopoulos and Eaton 2011). The big data becomes a specific and separated field in computer engineering society since it is difficult to be processed using the traditional database and software techniques. Big data has other different specific properties such as, the velocity which refers to the speed at which data is being generated, the variety which means the existence of structured and unstructured
Data Loading...