Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

PDF / 2,783,027 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
84 Downloads / 153 Views

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric Yongchan Kwon1 · Wonyoung Kim1 · Masashi Sugiyama2,3 · Myunghee Cho Paik1 Received: 3 May 2019 / Revised: 7 July 2019 / Accepted: 6 September 2019 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Abstract We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning). Recent studies in PU learning have shown superior performance theoretically and empirically. However, most existing algorithms may not be suitable for large-scale datasets because they face repeated computations of a large Gram matrix or require massive hyperparameter optimization. In this paper, we propose a computationally efficient and theoretically grounded PU learning algorithm. The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the derived excess risk bound has an explicit form, which vanishes as sample sizes increase. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm. Keywords Positive and unlabeled learning · Integral probability metric · Excess risk bound · Approximation error · Reproducing kernel Hilbert space

Editors: Kee-Eung Kim and Jun Zhu.

B

Myunghee Cho Paik [email protected] Yongchan Kwon [email protected] Wonyoung Kim [email protected] Masashi Sugiyama [email protected]

1

Department of Statistics, Seoul National University, Seoul, South Korea

2

Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan

3

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan

123

Machine Learning

1 Introduction Supervised binary classification assumes that all the training data are labeled as either being positive or negative. However, in many practical scenarios, collecting a large number of labeled samples from the two categories is often costly, difficult, or not even possible. In contrast, unlabeled data are relatively cheap and abundant. As a consequence, semi-supervised learning is used for partially labeled data (Chapelle et al. 2006). In this paper, as a special case of semi-supervised learning, we consider Positive-Unlabeld (PU) learning, the problem of building a binary classifier from only positive and unlabeled samples (Denis et al. 2005; Li and Liu 2005). PU learning provides a powerful framework when negative labels are impossible or very expensive to obtain, and thus has frequently appeared in many real-world applications. Examples include document classification (Elkan and Noto 2008; Xiao et al. 2011), image classification (Zuluaga et al

Data Loading...

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

Recommend Documents

Scalable Metric Learning via Weighted Approximate Rank Component Analysis

Joint Learning of Distance Metric and Kernel Classifier via Multiple Kernel Learning

Metric transfer learning via geometric knowledge embedding

Geometry and Analysis of Metric Spaces via Weighted Partitions

Probability integral transformation method

An Evolutionary Analytic Center Classifier

Multidimensional Integral Representations Problems of Analytic Conti

Probability via Expectation

Probability via Expectation

The Group Loss for Deep Metric Learning

Distributed Learning Classifier Systems

A Principled Approach to Data Valuation for Federated Learning