A hybrid feature selection scheme for mixed attributes data

  • PDF / 343,133 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 34 Downloads / 210 Views

DOWNLOAD

REPORT


A hybrid feature selection scheme for mixed attributes data Haitao Liu · Ruxiang Wei · Guoping Jiang

Received: 2 August 2012 / Accepted: 28 September 2012 / Published online: 15 March 2013 © SBMAC - Sociedade Brasileira de Matemática Aplicada e Computacional 2013

Abstract Feature selection aims at reducing the number of features in many applications. Existing feature selection approaches mainly deals with classification problems with continuous or discrete attributes. However, data usually come with mixed attributes in real-world applications. In this paper, a hybrid feature selection (HFS) scheme is proposed to deal with mixed attributes data. Firstly, a new correlation measure between mixed attributes is defined by giving a model for calculating mutual information between continuous and discrete attributes; secondly, the features are evaluated by a filter model with the new correlation measure; finally, feature selection is done by optimizing the parameter in the filter model with estimation accuracy criterion. Experimental results show that HFS acquires better stability and estimation accuracy. Keywords Feature selection · Mixed attributes · Mutual information · Filter · wrapper · Case-based reasoning

1 Introduction Feature selection (also known as variable selection or attribute selection) plays an important role in machine learning and pattern recognition (Hu et al. 2010; Guyon and Elisseeff 2003). It is to select some most effective features from the original feature set to reduce the dimension of the feature space according to certain criteria (Sheng 2000). By feature selection, some

Communicated by José Mario Martinez. H. Liu (B) · G. Jiang Department of Equipment E&M, Naval University of Engineering, Wuhan 430033, People’s Republic of China e-mail: [email protected] R. Wei College of Science, Naval University of Engineering, Wuhan 430033, People’s Republic of China

123

146

H. Liu et al.

irrelevant or redundant features are removed, thereby reducing the computational complexity, improving the estimation accuracy of the learning model and facilitating the intelligibility of the model (Amiri et al. 2011; Cakır et al. 2011). A great number of feature selection approaches have been developed in recent years. Two key issues in constructing a feature selection approach are the search strategy and the evaluating criteria (Yao et al. 2012; Mao et al. 2007). According to the search strategy, global (Somol et al. 2004), heuristic (Dash and Liu 2003) and random (Oh et al. 2004) strategies were introduced in the literatures. An overall review on this issue is presented in Monirul Kabir et al. (2011). With respect to the evaluation criteria, feature selection approaches can be classified into three categories (Monirul Kabir et al. 2011): the filter, the wrapper and the hybrid approach. The wrapper approach (Hsu et al. 2002; Verikas and Bacauskiene 2002; Wang et al. 2008; Zhu et al. 2007) assesses feature subset with the training accuracy of the learning model. The filter approach (Ke et al. 2008; Sun 2007;Fle