Feature Selection Method Based on Differential Correlation Information Entropy

PDF / 1,764,273 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
85 Downloads / 288 Views

Feature Selection Method Based on Differential Correlation Information Entropy Xiujuan Wang1 · Yixuan Yan1

· Xiaoyue Ma1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Feature selection is one of the major aspects of pattern classification systems. In previous studies, Ding and Peng recognized the importance of feature selection and proposed a minimum redundancy feature selection method to minimize redundant features for sequential selection in microarray gene expression data. However, since the minimum redundancy feature selection method is used mainly to measure the dependency between random variables of mutual information, the results cannot be optimal without consideration of global feature selection. Therefore, based on the framework of minimum redundancy-maximum correlation, this paper introduces entropy to measure global feature selection and proposes a new feature subset evaluation method, differential correlation information entropy. In our function, different bivariate correlation metrics are selected. Then, the feature selection is completed through sequence forward search. Two different classification models are used on eleven standard data sets of the UCI machine learning knowledge base to compare various comparison algorithms, such as mRMR, reliefF and feature selection method with joint maximal information entropy, with our method. The experimental results show that feature selection based on our proposed method is obviously superior to that of other models. Keywords Differential correlation information entropy · mRMR · Classification · Feature selection

1 Introduction In recent years, the exponential growth of data volume in various industries has brought new challenges to machine learning research. Unrelated and redundant data features have resulted in an increase in the computational complexity of machine learning models, which has had

B

Yixuan Yan [email protected] Xiujuan Wang [email protected] Xiaoyue Ma [email protected]

1

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

123

X. Wang et al.

a great impact on the accuracy and efficiency of model learning, a phenomenon also called “the curse of dimensionality”. Feature selection can eliminate unrelated and redundant features to reduce spatial complexity and improve the accuracy and efficiency of machine learning models. The advantages of feature selection can be summarized as follows. (1) Dimension reduction can decrease the computational complexity of the learning models; (2) noise reduction can improve classification accuracy; and (3) more interpretable features can contribute to identifying and monitoring target diseases or function types [1]. The purpose of feature selection is to reduce the dimensionality of the data by removing features that are irrelevant or redundant [2]. In microarray gene expression, Ding and Peng proposed a filter-based method, called minimum redundancy maximum relevancy (mRMR), to find the optimum subset of genes [1,3]. A

Data Loading...

Feature Selection Method Based on Differential Correlation Information Entropy

Recommend Documents

Micro-Expression Recognition Algorithm Based on Information Entropy Feature

Differential Privacy Trajectory Protection Method Based on Spatiotemporal Correlation

A Differential Evolution Approach to Feature Selection and Instance Selection

An information-theoretic graph-based approach for feature selection

Feature selection based on maximal neighborhood discernibility

Information Entropy Based Planning

Image robust recognition based on feature-entropy-oriented differential fusion capsule network

Feature selection algorithm based on dual correlation filters for cancer-associated somatic variants

Fast and Straightforward Feature Selection Method

Flame Detection Method Based on Feature Recognition

A multi-objective feature selection method based on bacterial foraging optimization

Feature Selection Method Based on Chi-Square Test and Minimum Redundancy