On generalization in moment-based domain adaptation

  • PDF / 1,160,985 Bytes
  • 37 Pages / 439.642 x 666.49 pts Page_size
  • 2 Downloads / 193 Views

DOWNLOAD

REPORT


On generalization in moment-based domain adaptation Werner Zellinger1

· Bernhard A. Moser1 · Susanne Saminger-Platz2

Accepted: 11 November 2020 © Springer Nature Switzerland AG 2020

Abstract Domain adaptation algorithms are designed to minimize the misclassification risk of a discriminative model for a target domain with little training data by adapting a model from a source domain with a large amount of training data. Standard approaches measure the adaptation discrepancy based on distance measures between the empirical probability distributions in the source and target domain. In this setting, we address the problem of deriving generalization bounds under practice-oriented general conditions on the underlying probability distributions. As a result, we obtain generalization bounds for domain adaptation based on finitely many moments and smoothness conditions. Keywords Transfer learning · Domain adaptation · Moment distance · Learning theory · Classification · Total variation distance · Probability metric Mathematics Subject Classification (2010) 68Q32 · 68T05 · 68T30

1 Motivation Domain adaptation problems are encountered in everyday life of engineering machine learning applications whenever there is a discrepancy between assumptions on the learning and application setting. For example, most theoretical and practical results in statistical learning are based on the assumption that the training and test sample are drawn from the same  Werner Zellinger

[email protected] Bernhard A. Moser [email protected] Susanne Saminger-Platz [email protected] 1

Data Science, Software Competence Center Hagenberg GmbH (SCCH), Hagenberg im M¨uhlkreis, Austria

2

Department of Knowledge-Based Mathematical Systems, Johannes Kepler University Linz, Linz, Austria

W. Zellinger et al.

distribution. As outlined in [1–4], however, this assumption may be violated in typical applications such as natural language processing [5, 6] and computer vision [7–9]. In this work, we relax the classical assumption of identical distributions under training and application setting by postulating that only a finite number of moments of these distributions are aligned. This postulate is motivated two-fold: First, by the methodology to overcome a present difference in distributions by mapping the samples into a latent model space where the resulting corresponding distributions are aligned. See Fig. 1 for an illustration. Momentbased algorithms perform particularly well in many practical tasks [8, 10–23]. Second, by the current scientific discussion about the choice of an appropriate distance function for domain adaptation [7, 24–28]. The convergence in most common probability metrics of compactly supported distributions implies the convergence of finitely many moments. In particular, many common probability metrics admit upper bounds on moment-based distances, see Fig. 2. Therefore, results under the proposed setting can also give theoretical insights to approaches based on stronger concepts of similarity like the Wasserstein