Feature construction as a bi-level optimization problem
- PDF / 2,096,629 Bytes
- 22 Pages / 595.276 x 790.866 pts Page_size
- 18 Downloads / 264 Views
(0123456789().,-volV)(0123456789().,-volV)
ORIGINAL ARTICLE
Feature construction as a bi-level optimization problem Marwa Hammami1 • Slim Bechikh1,3 • Ali Louati2 • Mohamed Makhlouf4 • Lamjed Ben Said1 Received: 8 May 2019 / Accepted: 10 February 2020 Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms. Keywords Feature construction Data classification Bi-level optimization Evolutionary algorithms
1 Introduction Classification is one of the important tasks in machine learning and data mining, which aims to classify each instance in the dataset into different classes based on its features. It is difficult to determine which features are useful without a prior knowledge. However, not all features in a feature vector are essential since many of them are irrelevant and redundant, which may negatively affect the & Slim Bechikh [email protected] 1
SMART lab, University of Tunis, ISG-Campus, Tunis, Tunisia
2
Information Systems Department, Prince Sattam bin Abdulaziz University, Alkharj 11942, Kingdom of Saudi Arabia
3
LMVSR, Kennesaw State University, Kennesaw, GA, USA
4
Kedge Business School, Talence, France
classification accuracy and reduce the quality of the whole feature set due to the large search space known as ‘‘the curse of dimensionality’’ [1]
Data Loading...