Skew Gaussian processes for classification

  • PDF / 4,964,373 Bytes
  • 26 Pages / 439.37 x 666.142 pts Page_size
  • 24 Downloads / 215 Views

DOWNLOAD

REPORT


Skew Gaussian processes for classification Alessio Benavoli1   · Dario Azzimonti2 · Dario Piga2 Received: 9 January 2020 / Revised: 5 May 2020 / Accepted: 12 August 2020 © The Author(s) 2020

Abstract Gaussian processes (GPs) are distributions over functions, which provide a Bayesian nonparametric approach to regression and classification. In spite of their success, GPs have limited use in some applications, for example, in some cases a symmetric distribution with respect to its mean is an unreasonable model. This implies, for instance, that the mean and the median coincide, while the mean and median in an asymmetric (skewed) distribution can be different numbers. In this paper, we propose skew-Gaussian processes (SkewGPs) as a non-parametric prior over functions. A SkewGP extends the multivariate unified skewnormal distribution over finite dimensional vectors to a stochastic processes. The SkewGP class of distributions includes GPs and, therefore, SkewGPs inherit all good properties of GPs and increase their flexibility by allowing asymmetry in the probabilistic model. By exploiting the fact that SkewGP and probit likelihood are conjugate model, we derive closed form expressions for the marginal likelihood and predictive distribution of this new nonparametric classifier. We verify empirically that the proposed SkewGP classifier provides a better performance than a GP classifier based on either Laplace’s method or expectation propagation. Keywords  Skew Gaussian Process · Nonparametric · Classifier · Probit · Conjugate · Skew

1 Introduction Gaussian processes (GPs) extend multivariate Gaussian distributions over finite dimensional vectors to infinite dimensionality. Specifically, a GP defines a distribution over functions, that is each draw from a Gaussian process is a function. Therefore, GPs provide a principled, practical, and probabilistic approach to nonparametric regression and classification and they have successfully been applied to different domains (Rasmussen and Williams 2006). Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Alessio Benavoli [email protected] 1

Department of Computer Science and Information Systems, University of Limerick, Limerick, Ireland

2

Dalle Molle Institute for Artificial Intelligence Research (IDSIA) - USI/SUPSI, Manno, Switzerland



13

Vol.:(0123456789)



Machine Learning

GPs have several desirable mathematical properties. The most appealing one is that, for regression with Gaussian noise, the prior distribution is conjugate for the likelihood function. Therefore the Bayesian update step is analytic, as is computing the predictive distribution for the function behavior at unknown locations. In spite of their success, GPs have several known shortcomings. First, the Gaussian distribution is not a “heavy-tailed” distribution, and so it is not robust to extreme outliers. Alternative to GPs have been proposed of which the most notable example is represented by the class of elliptical processes (Fang 2018), such as Student-t processes (O’