A hybrid feature selection scheme for mixed attributes data

PDF / 343,133 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
34 Downloads / 223 Views

A hybrid feature selection scheme for mixed attributes data Haitao Liu · Ruxiang Wei · Guoping Jiang

Received: 2 August 2012 / Accepted: 28 September 2012 / Published online: 15 March 2013 © SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2013

Abstract Feature selection aims at reducing the number of features in many applications. Existing feature selection approaches mainly deals with classification problems with continuous or discrete attributes. However, data usually come with mixed attributes in real-world applications. In this paper, a hybrid feature selection (HFS) scheme is proposed to deal with mixed attributes data. Firstly, a new correlation measure between mixed attributes is defined by giving a model for calculating mutual information between continuous and discrete attributes; secondly, the features are evaluated by a filter model with the new correlation measure; finally, feature selection is done by optimizing the parameter in the filter model with estimation accuracy criterion. Experimental results show that HFS acquires better stability and estimation accuracy. Keywords Feature selection · Mixed attributes · Mutual information · Filter · wrapper · Case-based reasoning

1 Introduction Feature selection (also known as variable selection or attribute selection) plays an important role in machine learning and pattern recognition (Hu et al. 2010; Guyon and Elisseeff 2003). It is to select some most effective features from the original feature set to reduce the dimension of the feature space according to certain criteria (Sheng 2000). By feature selection, some

Communicated by José Mario Martinez. H. Liu (B) · G. Jiang Department of Equipment E&M, Naval University of Engineering, Wuhan 430033, People’s Republic of China e-mail: [email protected] R. Wei College of Science, Naval University of Engineering, Wuhan 430033, People’s Republic of China

123

146

H. Liu et al.

irrelevant or redundant features are removed, thereby reducing the computational complexity, improving the estimation accuracy of the learning model and facilitating the intelligibility of the model (Amiri et al. 2011; Cakır et al. 2011). A great number of feature selection approaches have been developed in recent years. Two key issues in constructing a feature selection approach are the search strategy and the evaluating criteria (Yao et al. 2012; Mao et al. 2007). According to the search strategy, global (Somol et al. 2004), heuristic (Dash and Liu 2003) and random (Oh et al. 2004) strategies were introduced in the literatures. An overall review on this issue is presented in Monirul Kabir et al. (2011). With respect to the evaluation criteria, feature selection approaches can be classified into three categories (Monirul Kabir et al. 2011): the filter, the wrapper and the hybrid approach. The wrapper approach (Hsu et al. 2002; Verikas and Bacauskiene 2002; Wang et al. 2008; Zhu et al. 2007) assesses feature subset with the training accuracy of the learning model. The filter approach (Ke et al. 2008; Sun 2007;Fle

Data Loading...

A hybrid feature selection scheme for mixed attributes data

Recommend Documents

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Evolutionary Hybrid Feature Selection for Cancer Diagnosis

A new hybrid stability measure for feature selection

A parallel hybrid krill herd algorithm for feature selection

Feature Selection for Data and Pattern Recognition

A Feature Selection Method for Multi-dimension Time-Series Data

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Unsupervised Hierarchical Feature Selection on Networked Data

Feature Selection for Clustering

A strong intuitionistic fuzzy feature association map-based feature selection technique for high-dimensional data

A hybrid feature selection approach based on improved PSO and filter approaches for image steganalysis

A Hybrid Graph Centrality Based Feature Selection Approach for Supervised Learning