Fast feature selection for interval-valued data through kernel density estimation entropy

PDF / 4,720,189 Bytes
18 Pages / 595.276 x 790.866 pts Page_size
50 Downloads / 302 Views

ORIGINAL ARTICLE

Fast feature selection for interval‑valued data through kernel density estimation entropy Jianhua Dai1 · Ye Liu1 · Jiaolong Chen1 · Xiaofeng Liu1 Received: 29 November 2019 / Accepted: 10 April 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Kernel density estimation, which is a non-parametric method about estimating probability density distribution of random variables, has been used in feature selection. However, existing feature selection methods based on kernel density estimation seldom consider interval-valued data. Actually, interval-valued data exist widely. In this paper, a feature selection method based on kernel density estimation for interval-valued data is proposed. Firstly, the kernel function in kernel density estimation is defined for interval-valued data. Secondly, the interval-valued kernel density estimation probability structure is constructed by the defined kernel function, including kernel density estimation conditional probability, kernel density estimation joint probability and kernel density estimation posterior probability. Thirdly, kernel density estimation entropies for interval-valued data are proposed by the constructed probability structure, including information entropy, conditional entropy and joint entropy of kernel density estimation. Fourthly, we propose a feature selection approach based on kernel density estimation entropy. Moreover, we improve the proposed feature selection algorithm and propose a fast feature selection algorithm based on kernel density estimation entropy. Finally, comparative experiments are conducted from three perspectives of computing time, intuitive identifiability and classification performance to show the feasibility and the effectiveness of the proposed method. Keywords Kernel density estimation · Entropy · Feature selection · Kernel function · Interval-valued decision table

1 Introduction Feature selection is of great practical significance in real life. The purpose of feature selection is to select feature subset that can most effectively represent the decision from feature set of original data. Therefore, we can eliminate some attributes that are not related to decision, reduce the dimension of data, reduce over fitting, and improve the generalization ability of the model. Thus, feature selection has attracted the attentions of many researchers [1–9]. Especially in feature selection in numerical data, some researchers [10, 11] use discrete operation to preprocess numerical data. However, it is worth noting that discretization will lead to the loss of information in data. In order to avoid the discretization of numerical features, we can catch the distribution * Jianhua Dai [email protected] 1

Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China

characteristics of numerical data and estimate the probability density of numerical data. There are two types of probability density estimation: parametric estimation an

Data Loading...

Fast feature selection for interval-valued data through kernel density estimation entropy

Recommend Documents

Optimal Kernel Selection for Density Estimation

Kernel Circular Deconvolution Density Estimation

Fast and Straightforward Feature Selection Method

Feature Selection for Data and Pattern Recognition

Automatic bandwidth selection for recursive kernel density estimators with length-biased data

Fast Motion Estimation and Intermode Selection for H.264

New multivariate kernel density estimator for uncertain data classification

Efficient Multi-frequency Phase Unwrapping Using Kernel Density Estimation

Feature Selection Method Based on Differential Correlation Information Entropy

Large-Scale Simultaneous Testing Using Kernel Density Estimation

Unsupervised Hierarchical Feature Selection on Networked Data

Feature Selection for Clustering