New multivariate kernel density estimator for uncertain data classification

PDF / 1,779,405 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
32 Downloads / 252 Views

New multivariate kernel density estimator for uncertain data classification Byunghoon Kim1,3 · Young‑Seon Jeong2,3 · Myong K. Jeong3,4

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Uncertainty in data occurs in diverse applications due to measurement errors, data incompleteness, and multiple repeated measurements. Several classifiers for uncertain data have been developed to tackle this uncertainty. However, the existing classifiers do not consider the dependencies among uncertain features, even though this dependency has a critical effect on classification accuracy. Therefore, we propose a new Bayesian classification model that considers the correlation among uncertain features. To handle the uncertainty of data, new multivariate kernel density estimators are developed to estimate the class conditional probability density function of categorical, continuous, and mixed uncertain data. Experimental results with simulated data and real-life data sets show that the proposed approach is better than the existing approaches for classification of uncertain data in terms of classification accuracy. Keywords Uncertain classification · Kernel density estimator · Bayesian classifier · Semiconductor DRAM

1 Introduction Classification is a supervised learning approach to classify unseen instances into corresponding classes. Numerous traditional classification algorithms include decision tree, neural networks, support vector machines, and Bayesian classifier. These algorithms are widely used in many applications such as fault diagnosis in semiconductor manufacturing processes, medical diagnosis, image classification, and customer relationship management (Chaovalitwongse et al. 2011; Hastie et al. 2009; Jeong et al. 2008; Kim 2015; Lee and Jun 2015; Sariannidis et al. 2019; Wang et al. 2018). Although traditional classification * Young‑Seon Jeong [email protected] Myong K. Jeong [email protected] 1

Department of Industrial and Management Engineering, Hanyang University, Ansan, Korea

2

Department of Industrial Engineering, Chonnam National University, Gwangju, Korea

3

Department of Industrial and Systems Engineering, Rutgers University, New Brunswick, NJ, USA

4

Rutgers Center for Operations Research, Rutgers University, New Brunswick, NJ, USA

13

Vol.:(0123456789)

Annals of Operations Research

models assume that data values are certain, uncertain data are inherent in many applications (Aggarwal 2007). Many factors contribute to the uncertainty of data, e.g. imprecision of measuring equipment, data randomness and incompleteness, and delayed updates (Hülsmann and Brockmann 2012; Pei et al. 2007; Scott 2015; Tavakkol et al. 2017). A source of uncertainty is multiple repeated measurements. For example, while a patient’s body temperature can be measured multiple times, the multiple records can be inconsistent because of measurement errors. In addition, data values change continuously, such as positions of mobile devices or observations associated with natural phenomena. At l

Data Loading...

New multivariate kernel density estimator for uncertain data classification

Recommend Documents

New type of gamma kernel density estimator

Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal

New Statistical Kernel-Projection Estimator in the Monte Carlo Method

Variable Selection for Classification of Multivariate Functional Data

Optimal Kernel Selection for Density Estimation

Kernel Matching Reduction Algorithms for Classification

Kernel Circular Deconvolution Density Estimation

Mixed Kernel Functions for Multivariate Statistical Monitoring of Nonlinear Processes

Multivariate Predictive Clustering Trees for Classification

Fast feature selection for interval-valued data through kernel density estimation entropy

Automatic bandwidth selection for recursive kernel density estimators with length-biased data

Kernel machines for current status data