Binary classification with ambiguous training data

PDF / 1,692,564 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
16 Downloads / 283 Views

Binary classification with ambiguous training data Naoya Otani1 · Yosuke Otsubo1 · Tetsuya Koike1 · Masashi Sugiyama2,3 Received: 16 April 2020 / Revised: 29 July 2020 / Accepted: 19 September 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1-c loss with rejection cost c. More specifically, we propose to train a classifier and a rejector under the 0-1-c-d loss using P, N, and A samples, where d is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1-c-d loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data. Keywords Ambiguous samples · Classification with reject option · Binary classification

1 Introduction Supervised learning has been successfully deployed in various real-world applications, such as medical diagnosis (Bar et al. 2015; Wang et al. 2016; Esteva et al. 2017) and manufacturing systems (Park et al. 2016; Ren et al. 2017). However, when the amount of labeled data is limited, current supervised learning methods still do not work reliably (Pesapane et al. 2018). To efficiently obtain labeled data, domain knowledge has been used in many application areas (Ren et al. 2017; Cruciani et al. 2018; Konishi et al. 2019; Bejnordi et al. Editors: Kee-Eung Kim and Vineeth N.Balasubramanian * Naoya Otani [email protected] 1

Nikon Corporation, Research and Development Division, 471, Nagaodai‑cho, Sakae‑ku, Yokohama‑city, Kanagawa 244‑8533, Japan

2

RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

3

The University of Tokyo, Graduate School of Frontier Sciences, Chiba, Japan

13

Vol.:(0123456789)

Machine Learning

2017). However, as some studies have pointed out (Wagner et al. 2005; Li et al. 2016; Shahriyar et al. 2018), there are often ambiguous samples that are substantially difficult to label even by domain experts. The goal of this paper is to propose a novel classification method that can handle such ambiguous data. More specifically, we consider a binary classification problem where, in addition to positive (P) and negative (N) samples, ambiguous (A) samples are available for training a classifier. Naively, we may consider employing 3-class classification methods

Data Loading...

Binary classification with ambiguous training data

Recommend Documents

Atypical/Ambiguous/Non-binary Genitalia

Under-Sample Binary Data Using CURE for Classification

Ambiguous Loss

High dimensional model representation of log likelihood ratio: binary classification with SNP data

The classification of binary eutectics

Modeling Bivariate Binary Data

Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification

Data Augmentation with Transformers for Text Classification

Unified Performance Measure for Binary Classification Problems

Binary Classification for Failure Risk Assessment

Training Data, Sufficiency

Data Science and Classification