Fast feature selection for interval-valued data through kernel density estimation entropy
- PDF / 4,720,189 Bytes
- 18 Pages / 595.276 x 790.866 pts Page_size
- 50 Downloads / 281 Views
ORIGINAL ARTICLE
Fast feature selection for interval‑valued data through kernel density estimation entropy Jianhua Dai1 · Ye Liu1 · Jiaolong Chen1 · Xiaofeng Liu1 Received: 29 November 2019 / Accepted: 10 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract Kernel density estimation, which is a non-parametric method about estimating probability density distribution of random variables, has been used in feature selection. However, existing feature selection methods based on kernel density estimation seldom consider interval-valued data. Actually, interval-valued data exist widely. In this paper, a feature selection method based on kernel density estimation for interval-valued data is proposed. Firstly, the kernel function in kernel density estimation is defined for interval-valued data. Secondly, the interval-valued kernel density estimation probability structure is constructed by the defined kernel function, including kernel density estimation conditional probability, kernel density estimation joint probability and kernel density estimation posterior probability. Thirdly, kernel density estimation entropies for interval-valued data are proposed by the constructed probability structure, including information entropy, conditional entropy and joint entropy of kernel density estimation. Fourthly, we propose a feature selection approach based on kernel density estimation entropy. Moreover, we improve the proposed feature selection algorithm and propose a fast feature selection algorithm based on kernel density estimation entropy. Finally, comparative experiments are conducted from three perspectives of computing time, intuitive identifiability and classification performance to show the feasibility and the effectiveness of the proposed method. Keywords Kernel density estimation · Entropy · Feature selection · Kernel function · Interval-valued decision table
1 Introduction Feature selection is of great practical significance in real life. The purpose of feature selection is to select feature subset that can most effectively represent the decision from feature set of original data. Therefore, we can eliminate some attributes that are not related to decision, reduce the dimension of data, reduce over fitting, and improve the generalization ability of the model. Thus, feature selection has attracted the attentions of many researchers [1–9]. Especially in feature selection in numerical data, some researchers [10, 11] use discrete operation to preprocess numerical data. However, it is worth noting that discretization will lead to the loss of information in data. In order to avoid the discretization of numerical features, we can catch the distribution * Jianhua Dai [email protected] 1
Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China
characteristics of numerical data and estimate the probability density of numerical data. There are two types of probability density estimation: parametric estimation an
Data Loading...