Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric
- PDF / 2,783,027 Bytes
- 20 Pages / 439.37 x 666.142 pts Page_size
- 84 Downloads / 153 Views
Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric Yongchan Kwon1 · Wonyoung Kim1 · Masashi Sugiyama2,3 · Myunghee Cho Paik1 Received: 3 May 2019 / Revised: 7 July 2019 / Accepted: 6 September 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019
Abstract We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning). Recent studies in PU learning have shown superior performance theoretically and empirically. However, most existing algorithms may not be suitable for large-scale datasets because they face repeated computations of a large Gram matrix or require massive hyperparameter optimization. In this paper, we propose a computationally efficient and theoretically grounded PU learning algorithm. The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the derived excess risk bound has an explicit form, which vanishes as sample sizes increase. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm. Keywords Positive and unlabeled learning · Integral probability metric · Excess risk bound · Approximation error · Reproducing kernel Hilbert space
Editors: Kee-Eung Kim and Jun Zhu.
B
Myunghee Cho Paik [email protected] Yongchan Kwon [email protected] Wonyoung Kim [email protected] Masashi Sugiyama [email protected]
1
Department of Statistics, Seoul National University, Seoul, South Korea
2
Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
3
Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
123
Machine Learning
1 Introduction Supervised binary classification assumes that all the training data are labeled as either being positive or negative. However, in many practical scenarios, collecting a large number of labeled samples from the two categories is often costly, difficult, or not even possible. In contrast, unlabeled data are relatively cheap and abundant. As a consequence, semi-supervised learning is used for partially labeled data (Chapelle et al. 2006). In this paper, as a special case of semi-supervised learning, we consider Positive-Unlabeld (PU) learning, the problem of building a binary classifier from only positive and unlabeled samples (Denis et al. 2005; Li and Liu 2005). PU learning provides a powerful framework when negative labels are impossible or very expensive to obtain, and thus has frequently appeared in many real-world applications. Examples include document classification (Elkan and Noto 2008; Xiao et al. 2011), image classification (Zuluaga et al
Data Loading...