A new ensemble feature selection approach based on genetic algorithm
- PDF / 522,705 Bytes
- 10 Pages / 595.276 x 790.866 pts Page_size
- 98 Downloads / 250 Views
METHODOLOGIES AND APPLICATION
A new ensemble feature selection approach based on genetic algorithm Hongzhi Wang1
· Chengquan He1 · Zhuping Li1
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract In the ensemble feature selection method, if the weight adjustment is performed on each feature subset used, the ensemble effect can be significantly different; therefore, how to find the optimized weight vector is a key and challenging problem. Aiming at this optimization problem, this paper proposes an ensemble feature selection approach based on genetic algorithm (EFS-BGA). After each base feature selector generates a feature subset, the EFS-BGA method obtains the optimized weight of each feature subset through genetic algorithm, which is different from traditional genetic algorithm directly processing single features. We divide the EFS-BGA algorithm into two types. The first is a complete ensemble feature selection method; based on the first, we further propose the selective EFS-BGA model. After that, through mathematical analysis, we theoretically explain why weight adjustment is an optimization problem and how to optimize. Finally, through the comparative experiments on multiple data sets, the advantages of the EFS-BGA algorithm in this paper over the previous ensemble feature selection algorithms are explained in practice. Keywords Ensemble feature selection · Optimization problem · Genetic algorithm
1 Introduction The ensemble feature selection method is to generate an optimized feature subset from a plurality of feature subsets that have been obtained by using some integration strategy (Saeys et al. 2008). If multiple feature selection algorithms are used on the same training set to obtain multiple feature subsets, this is heterogeneous ensemble; if the same feature selection algorithm is used on different training sets, it is homogenous ensemble. In our previous work, we analyzed the heterogeneous ensemble feature selection method. This paper mainly analyzes the homogenous ensemble feature selection method, that is, the original data set is sampled multiple times by the boostrap method, and multiple training subsets are generated to train the same feature selection algorithm. Because of the different training data, the subsets of features generated by the training are different too (Mitchell et al. 2014). By adjusting the weights of the feature Communicated by V. Loia.
B 1
Hongzhi Wang [email protected] Harbin Institute of Technology, Harbin, China
subsets, the contribution of the feature subset to the training performance can be changed. After the following theoretical analysis, it can be concluded that the generalization error of the weighted ensemble feature selection model is better than that of the unweighted ensemble feature selection model. For multiple feature subsets, how to get the optimized weight vector could be considered as an optimization problem. Aiming at this optimization problem, this paper proposes a new ensemble feature selection algorithm that uses genetic algorithm
Data Loading...