Dynamic feature selection method with minimum redundancy information for linear data

PDF / 3,238,941 Bytes
18 Pages / 595.224 x 790.955 pts Page_size
36 Downloads / 230 Views

Dynamic feature selection method with minimum redundancy information for linear data HongFang Zhou1 · Jing Wen1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Feature selection plays a fundamental role in many data mining and machine learning tasks. In this paper, we proposed a novel feature selection method, namely, Dynamic Feature Selection Method with Minimum Redundancy Information (MRIDFS). In MRIDFS, the conditional mutual information is used to calculate the relevance and the redundancy among multiple features, and a new concept, the feature-dependent redundancy ratio, was introduced. Such ratio can represent redundancy more accurately. To evaluate our method, MRIDFS is tested and compared with seven popular methods on 16 benchmark data sets. Experimental results show that MRIDFS outperforms in terms of average classification accuracy. Keywords Feature selection · Mutual information · Conditional redundancy · Linear data

1 Introduction Classification plays a fundamental and important role in pattern recognition and data mining [1, 2]. In most cases, some irrelevant or redundant features are involved. Feature selection urges in face of this. Generally speaking, the feature selection methods can be divided into 3 types which are the wrapper method [3–6], the embedded method [7–11], and the filter method [12– 16]. The wrapper method is classifier-dependent, and it uses a particular classifier to get the subset with high accuracy, but it has higher computational cost. The embedded method usually integrates the feature selection with the machine learning training process, and it depends on the classifiers to some degrees. In order to select the relevant features, it is required to construct a model [17]. This method has lower computational expense, and does not prone to be overfitting. But it is difficult to construct the above-mentioned model. The filter method is classifier-independent, and it ranks the features according to its relevance to the HongFang Zhou

class labels in the supervised learning. Consequently, the filter method is prevailing due to its simplicity and high computational efficiency. The filter method ranks the features according to their relevance to the class labels in the supervised learning. Many criteria have been proposed to calculate the relevance score, such as distance, mutual information [18, 19], correlation [20, 21], and consistent measures. Information theory has been widely applied to the filter method because information theory is an effective criterion for evaluating both linear and nonlinear relations among variables [2, 22, 23]. Therefore, such theoretical analysis is our focus in the paper. The rest of the paper is organized as follows. Some popular filter methods are reviewed in Section 2. Some basic concepts of information theory [24] are described in Section 3. Our feature selection method is described in details in Section 4. We give the experimental results and analysis in Section 5. Finally, we conclude our works in Section 6.

2 Rela

Data Loading...

Dynamic feature selection method with minimum redundancy information for linear data

Recommend Documents

Feature Selection Method Based on Chi-Square Test and Minimum Redundancy

Minimum redundancy maximum relevance (mRMR) based feature selection from endoscopic images for automatic gastrointestina

A Feature Selection Method for Multi-dimension Time-Series Data

Feature Selection Method Based on Differential Correlation Information Entropy

Feature Selection for Data and Pattern Recognition

Variable selection for generalized partially linear models with longitudinal data

Information Theory for Gabor Feature Selection for Face Recognition

A new feature selection using dynamic interaction

An information-theoretic graph-based approach for feature selection

Deep Learning-Based Severe Dengue Prognosis Using Human Genome Data with Novel Feature Selection Method

Fast and Straightforward Feature Selection Method

Unsupervised Hierarchical Feature Selection on Networked Data