A new feature selection using dynamic interaction

  • PDF / 1,533,839 Bytes
  • 13 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 250 Views

DOWNLOAD

REPORT


THEORETICAL ADVANCES

A new feature selection using dynamic interaction Zhang Li1,2  Received: 13 May 2019 / Accepted: 9 September 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract With the continuous development of Internet technology, data gradually present a complicated and high-dimensional trend. These high-dimensional data have a large number of redundant features and irrelevant features, which bring great challenges to the existing machine learning algorithms. Feature selection is one of the important research topics in the fields of machine learning, pattern recognition and data mining, and it is also an important means in the data preprocessing stage. Feature selection is to look for the optimal feature subset from the original feature set, which would improve the classification accuracy and reduce the machine learning time. The traditional feature selection algorithm tends to ignore the kind of feature which has a weak distinguishing capacity as a monomer, whereas the feature group’s distinguishing capacity is strong. Therefore, a new dynamic interaction feature selection (DIFS) algorithm is proposed in this paper. Initially, under the theoretical framework of interactive information, it redefines the relevance, irrelevance and redundancy of the features. Secondly, it offers the computational formulas for calculating interactive information. Finally, under the eleven data sets of UCI and three different classifiers, namely, KNN, SVM and C4.5, the DIFS algorithm increases the classification accuracy of the FullSet by 3.2848% and averagely decreases the number of features selected by 15.137. Hence, the DIFS algorithm can not only identify the relevance feature effectively, but also identify the irrelevant and redundant features. Moreover, it can effectively improve the classification accuracy of the data sets and reduce the feature dimensions of the data sets. Keywords  Feature selection · Feature interaction · Feature relevance · Feature redundancy · Filter method

1 Introduction In the field of machine learning and pattern recognition [1], feature selection which is based on disaggregated models has aroused wide attention of so many researchers. It aims to seek the optimal feature subset from the original data set, and the feature subset is able to stand for the original data set. The advantages [2] lie in that it can reduce the machine learning time, avoid overfitting, cut down the physical storage of the data set and increase the classification accuracy of the algorithm. From the point of view of subset evaluation function, feature selection algorithms can be divided into three categories [3]: embedded, wrapper and filtering * Zhang Li [email protected] 1



School of Computer Engineering, Jiangsu University of Technology, Changzhou 213001, Jiangsu, China



Key Laboratory of Trustworthy Distributed Computing and Service (Ministry of Education), Beijing University of Posts and Telecommunications, Beijing 100876, China

2

methods. Embedded methods and wrapper methods usually