Entropy Repulsion for Semi-supervised Learning Against Class Mismatch

A series of semi-supervised learning (SSL) algorithms have been proposed to alleviate the need for labeled data by leveraging large amounts of unlabeled data. Those algorithms have achieved good performance on standard benchmark datasets, however, their p

  • PDF / 3,063,246 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 47 Downloads / 226 Views

DOWNLOAD

REPORT


University of Science and Technology of China, Hefei, China [email protected], [email protected] 2 TNLIST, Tsinghua University, Beijing, China

Abstract. A series of semi-supervised learning (SSL) algorithms have been proposed to alleviate the need for labeled data by leveraging large amounts of unlabeled data. Those algorithms have achieved good performance on standard benchmark datasets, however, their performance can degrade drastically when there exists a class mismatch between the labeled and unlabeled data, which is common in practice. In this work, we propose a new technique, entropy repulsion for mismatch (ERCM), to improve SSL against a class mismatch situation. Specifically, we design an entropy repulsion loss and a batch annealing and reloading mechanism, which work together to prevent potentially mismatched unlabeled data from participating in the early training stages as well as facilitate the minimization of the unsupervised loss term of traditional SSL algorithms. ERCM can be adopted to enhance existing SSL algorithms with minor extra computation cost and no change to their network structures. Our extensive experiments demonstrate that ERCM can significantly improve the performance of state-of-the-art SSL algorithms, namely Mean Teacher, Virtual Adversarial Training (VAT) and Mixmatch in various class-mismatch cases. Keywords: Semi-supervised learning

1

· Class mismatch

Introduction

Deep learning models have achieved remarkable performance on many supervised learning problems by leveraging large labeled datasets [12]. Creating large datasets with high-quality labels, however, is usually very labor-intensive and time-consuming [21,24]. Semi-supervised learning [3] (SSL) provides an attractive way to improve the performance of deep learning models by also utilizing easily obtainable unlabeled data, so as to mitigate the reliance on large labeled datasets. Algorithms for SSL mainly include the following core ideas: consistency regularization [11,14,19], entropy minimization [7,13], and traditional regularization [23]. Recent holistic approaches, Mixmatch [2] and UDA [20] achieve the state-of-the-art performance by combining these ideas above. Existing SSL algorithms usually demonstrate their successes using fullylabeled classification datasets (e.g., CIFAR-10 [10], SVHN [15] and Imagenet c Springer Nature Switzerland AG 2020  H. Yang et al. (Eds.): ICONIP 2020, LNCS 12533, pp. 307–319, 2020. https://doi.org/10.1007/978-3-030-63833-7_26

308

X. You et al.

[5]) by treating most samples of each dataset as unlabeled. Therefore, those evaluation results are based on an implicit assumption that all unlabeled samples come from the same classes as labeled samples. In real world, however, it is very likely that a large portion of the unlabeled samples do not belong to any classes of the labeled data, i.e., there exist a mismatch between class distributions of labeled and unlabeled data. As an example, if you intend to train a model to distinguish between ten classes of animals with only a small am