Agreeing to disagree: active learning with noisy labels without crowdsourcing

PDF / 4,759,421 Bytes
13 Pages / 595.276 x 790.866 pts Page_size
22 Downloads / 219 Views

ORIGINAL ARTICLE

Agreeing to disagree: active learning with noisy labels without crowdsourcing Mohamed‑Rafik Bouguelia1 · Slawomir Nowaczyk1 · K. C. Santosh2 · Antanas Verikas1

Received: 1 September 2016 / Accepted: 18 January 2017 © Springer-Verlag Berlin Heidelberg 2017

Abstract We propose a new active learning method for classification, which handles label noise without relying on multiple oracles (i.e., crowdsourcing). We propose a strategy that selects (for labeling) instances with a high influence on the learned model. An instance x is said to have a high influence on the model h, if training h on x (with label y = h(x)) would result in a model that greatly disagrees with h on labeling other instances. Then, we propose another strategy that selects (for labeling) instances that are highly influenced by changes in the learned model. An instance x is said to be highly influenced, if training h with a set of instances would result in a committee of models that agree on a common label for x but disagree with h(x). We compare the two strategies and we show, on different publicly available datasets, that selecting instances according to the first strategy while eliminating noisy labels according to the second strategy, greatly improves the accuracy compared to several benchmarking methods, even when a significant amount of instances are mislabeled.

* K. C. Santosh [email protected] Mohamed‑Rafik Bouguelia [email protected] Slawomir Nowaczyk [email protected] Antanas Verikas [email protected] 1

Center for Applied Intelligent Systems Research, Halmstad University, 30118 Halmstad, Sweden

2

Department of Computer Science, The University of South Dakota, 414 E Clark St, Vermillion, SD 57069, USA

Keywords Active learning · Classification · Label noise · Mislabeling

1 Introduction In order to learn a classification model, supervised learning algorithms need a training dataset where each instance is manually labeled. With a large amount of unlabeled instances, one needs to manually label as much instances as possible. Such instances are randomly selected by a human labeler or oracle (i.e., passive learning). With this setting, the learning methods need huge labeled data to produce an optimized classifier. Note that labeling is costly and time consuming. Semi-supervised learning methods like [21] learn using both labeled and unlabeled data, and can therefore be used to reduce the labeling cost to some extent. Nonetheless, instead of randomly selecting the instances to be labeled, active learning methods allow to further reduce the labeling cost by allowing interaction between the learning algorithm and the oracle. Unlike a passive learning, active learning lets the learner choose which instances are more appropriate for labeling, according to an informativeness measure. The main problem that active learning addresses is about defining informativeness in a way that reduces the number of instances to be labeled along with the improvement of the classifier’s performance. This is an important problem beca

Data Loading...

Agreeing to disagree: active learning with noisy labels without crowdsourcing

Recommend Documents

Learning with Noisy Class Labels for Instance Segmentation

Beyond missing: weakly-supervised multi-label learning with incomplete and noisy labels

Learning to Segment When Experts Disagree

Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection

Support Kernel Machine-Based Active Learning to Find Labels and a Proper Kernel Simultaneously

Active Learning with Crowdsourcing for the Cold Start of Imbalanced Classifiers

Efficient Multi-class Fetal Brain Segmentation in High Resolution MRI Reconstructions with Noisy Labels

Integrating Crowdsourcing and Active Learning for Classification of Work-Life Events from Tweets

Weakly Supervised Learning with Side Information for Noisy Labeled Images

Anomaly detection with inexact labels

Partial multi-label learning with noisy side information

Active deep Q-learning with demonstration