Weak approximation of transformed stochastic gradient MCMC

  • PDF / 1,871,604 Bytes
  • 21 Pages / 439.37 x 666.142 pts Page_size
  • 96 Downloads / 216 Views

DOWNLOAD

REPORT


Weak approximation of transformed stochastic gradient MCMC Soma Yokoi1,2   · Takuma Otsuka3 · Issei Sato2,4 Received: 10 January 2020 / Revised: 22 July 2020 / Accepted: 11 August 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract Stochastic gradient Langevin dynamics (SGLD) is a computationally efficient sampler for Bayesian posterior inference given a large scale dataset and a complex model. Although SGLD is designed for unbounded random variables, practical models often incorporate variables within a bounded domain, such as non-negative or a finite interval. The use of variable transformation is a typical way to handle such a bounded variable. This paper reveals that several mapping approaches commonly used in the literature produce erroneous samples from theoretical and empirical perspectives. We show that the change of random variable in discretization using an invertible Lipschitz mapping function overcomes the pitfall as well as attains the weak convergence, while the other methods are numerically unstable or cannot be justified theoretically. Experiments demonstrate its efficacy for widely-used models with bounded latent variables, including Bayesian non-negative matrix factorization and binary neural networks. Keywords  Stochastic gradient MCMC · Transform · Convergence analysis · Itô process

1 Introduction Sampling a random variable from a given target distribution is a key problem in Bayesian inference. The Langevin Monte Carlo (LMC) algorithm has attracted attention for its high efficiency and scalability in large dataset. Whereas sampling methods in this category are usually designed to handle unbounded random variables, a target variable is often limited Editors: Ira Assent, Carlotta Domeniconi, Aristides Gionis, Eyke Hüllermeier. * Soma Yokoi [email protected] 1

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, 5‑1‑5 Kashiwanoha, Kashiwa‑shi, Chiba 277‑8561, Japan

2

RIKEN, 1‑4‑1 Nihonbashi, Chuo‑ku, Tokyo 103‑0027, Japan

3

NTT Communication Science Laboratories, NTT Corporation, 2‑4 Hikaridai, Seika‑cho, Kyoto 619‑0237, Japan

4

Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, 7‑3‑1 Hongo, Bunkyo‑ku, Tokyo 113‑0033, Japan



13

Vol.:(0123456789)



Machine Learning

to some bounded space in practical problems. In such cases, the common practice is to match the domain of the variable with a transformation. For example, when a target variable must be non-negative, the exponential function is frequently adopted as transformation. This paper discusses the problem of drawing samples from a constrained target distribution via a transform of unconstrained samples generated by the LMC algorithm. More precisely, let 𝜃 ∼ 𝜋𝜃 (𝜃) be the target random variable in constrained state space ℝc e.g. (semi-) finite interval in ℝ , and 𝜑 ∼ 𝜋(𝜑) be a proxy random variable defined on the whole real line ℝ . Altho