An empirical classification procedure for nonparametric mixture models

  • PDF / 422,877 Bytes
  • 29 Pages / 439.37 x 666.142 pts Page_size
  • 39 Downloads / 194 Views

DOWNLOAD

REPORT


Online ISSN 2005-2863 Print ISSN 1226-3192

RESEARCH ARTICLE

An empirical classification procedure for nonparametric mixture models Qiang Zhao1 · Rohana J. Karunamuni2 · Jingjing Wu3 Received: 30 April 2019 / Accepted: 10 December 2019 © Korean Statistical Society 2020

Abstract Suppose that there are two populations which are mixed in proportions λ and (1 − λ), respectively, and an investigator wishes to classify an individual into one of these two populations based on a p-dimensional observation on the individual. This is the basic classification problem with applications in wide variety of fields. In practice, the optimal rule (Bayes rule) is not available and thus need to be estimated when either the densities of the populations or the mixing proportion λ are not completely specified. This paper presents a nonparametric classification procedure based on kernel estimates for the most general case that both the densities and the mixing proportion are unknown. The error rate of the proposed procedure is calculated and compared with that of the optimal rule. Convergence rate of the difference in error rate are also established. A Monte Carlo simulation study and a real data example are given to compare the proposed rule with the optimal rule for a variety of cases. Keywords Classification · Misclassification probability · Bayes rule · Kernel estimate Mathematics Subject Classification Primary 62F10 · 62E20; Secondary 60F05

1 Introduction Classification or discrimination is an important problem with applications in many fields such as biology, genetics, medicine, economics, etc. The classification problem arises when an investigator wishes to classify an individual into one of two popula-

B

Jingjing Wu [email protected]

1

School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, Shandong, China

2

Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2G1, Canada

3

Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada

123

Journal of the Korean Statistical Society

tions on the basis of a p-dimensional measurement on the individual. Suppose that individuals come from one of two populations, F and G, with respective probabilities λ and (1 − λ), where 0 < λ < 1. The population of the individual is given by the random variable D, where D = 1 if the individual belongs to G and D = 0 otherwise. A p-dimensional random vector Z is measured on each individual. A decision is to be made as to which population an individual belongs to on the basis of an observed value of Z . A classification rule is a partition of R p into two regions, R1 and R2 , such that an individual is classified as belonging to F if z ∈ R1 and to G if z ∈ R2 . A classification rule is considered optimal if it minimizes the probability of misclassification of a random individual. Denote f (z|D = 0) = f (z) and f (z|D = 1) = g(z), where f and g are density functions of F and G, respectively, with respect to (w.r.t.) Lebesgue measure. Then the optim