New multivariate kernel density estimator for uncertain data classification

  • PDF / 1,779,405 Bytes
  • 19 Pages / 439.37 x 666.142 pts Page_size
  • 32 Downloads / 226 Views

DOWNLOAD

REPORT


New multivariate kernel density estimator for uncertain data classification Byunghoon Kim1,3 · Young‑Seon Jeong2,3 · Myong K. Jeong3,4 

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Uncertainty in data occurs in diverse applications due to measurement errors, data incompleteness, and multiple repeated measurements. Several classifiers for uncertain data have been developed to tackle this uncertainty. However, the existing classifiers do not consider the dependencies among uncertain features, even though this dependency has a critical effect on classification accuracy. Therefore, we propose a new Bayesian classification model that considers the correlation among uncertain features. To handle the uncertainty of data, new multivariate kernel density estimators are developed to estimate the class conditional probability density function of categorical, continuous, and mixed uncertain data. Experimental results with simulated data and real-life data sets show that the proposed approach is better than the existing approaches for classification of uncertain data in terms of classification accuracy. Keywords  Uncertain classification · Kernel density estimator · Bayesian classifier · Semiconductor DRAM

1 Introduction Classification is a supervised learning approach to classify unseen instances into corresponding classes. Numerous traditional classification algorithms include decision tree, neural networks, support vector machines, and Bayesian classifier. These algorithms are widely used in many applications such as fault diagnosis in semiconductor manufacturing processes, medical diagnosis, image classification, and customer relationship management (Chaovalitwongse et  al. 2011; Hastie et  al. 2009; Jeong et  al. 2008; Kim 2015; Lee and Jun 2015; Sariannidis et  al. 2019; Wang et  al. 2018). Although traditional classification * Young‑Seon Jeong [email protected] Myong K. Jeong [email protected] 1

Department of Industrial and Management Engineering, Hanyang University, Ansan, Korea

2

Department of Industrial Engineering, Chonnam National University, Gwangju, Korea

3

Department of Industrial and Systems Engineering, Rutgers University, New Brunswick, NJ, USA

4

Rutgers Center for Operations Research, Rutgers University, New Brunswick, NJ, USA



13

Vol.:(0123456789)



Annals of Operations Research

models assume that data values are certain, uncertain data are inherent in many applications (Aggarwal 2007). Many factors contribute to the uncertainty of data, e.g. imprecision of measuring equipment, data randomness and incompleteness, and delayed updates (Hülsmann and Brockmann 2012; Pei et al. 2007; Scott 2015; Tavakkol et al. 2017). A source of uncertainty is multiple repeated measurements. For example, while a patient’s body temperature can be measured multiple times, the multiple records can be inconsistent because of measurement errors. In addition, data values change continuously, such as positions of mobile devices or observations associated with natural phenomena. At l