Dynamic feature selection method with minimum redundancy information for linear data

  • PDF / 3,238,941 Bytes
  • 18 Pages / 595.224 x 790.955 pts Page_size
  • 36 Downloads / 204 Views

DOWNLOAD

REPORT


Dynamic feature selection method with minimum redundancy information for linear data HongFang Zhou1 · Jing Wen1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Feature selection plays a fundamental role in many data mining and machine learning tasks. In this paper, we proposed a novel feature selection method, namely, Dynamic Feature Selection Method with Minimum Redundancy Information (MRIDFS). In MRIDFS, the conditional mutual information is used to calculate the relevance and the redundancy among multiple features, and a new concept, the feature-dependent redundancy ratio, was introduced. Such ratio can represent redundancy more accurately. To evaluate our method, MRIDFS is tested and compared with seven popular methods on 16 benchmark data sets. Experimental results show that MRIDFS outperforms in terms of average classification accuracy. Keywords Feature selection · Mutual information · Conditional redundancy · Linear data

1 Introduction Classification plays a fundamental and important role in pattern recognition and data mining [1, 2]. In most cases, some irrelevant or redundant features are involved. Feature selection urges in face of this. Generally speaking, the feature selection methods can be divided into 3 types which are the wrapper method [3–6], the embedded method [7–11], and the filter method [12– 16]. The wrapper method is classifier-dependent, and it uses a particular classifier to get the subset with high accuracy, but it has higher computational cost. The embedded method usually integrates the feature selection with the machine learning training process, and it depends on the classifiers to some degrees. In order to select the relevant features, it is required to construct a model [17]. This method has lower computational expense, and does not prone to be overfitting. But it is difficult to construct the above-mentioned model. The filter method is classifier-independent, and it ranks the features according to its relevance to the  HongFang Zhou

class labels in the supervised learning. Consequently, the filter method is prevailing due to its simplicity and high computational efficiency. The filter method ranks the features according to their relevance to the class labels in the supervised learning. Many criteria have been proposed to calculate the relevance score, such as distance, mutual information [18, 19], correlation [20, 21], and consistent measures. Information theory has been widely applied to the filter method because information theory is an effective criterion for evaluating both linear and nonlinear relations among variables [2, 22, 23]. Therefore, such theoretical analysis is our focus in the paper. The rest of the paper is organized as follows. Some popular filter methods are reviewed in Section 2. Some basic concepts of information theory [24] are described in Section 3. Our feature selection method is described in details in Section 4. We give the experimental results and analysis in Section 5. Finally, we conclude our works in Section 6.

2 Rela