Subsampling sequential Monte Carlo for static Bayesian models

  • PDF / 997,416 Bytes
  • 18 Pages / 595.276 x 790.866 pts Page_size
  • 68 Downloads / 191 Views

DOWNLOAD

REPORT


Subsampling sequential Monte Carlo for static Bayesian models David Gunawan1,4

· Khue-Dung Dang2,4 · Matias Quiroz2,4,5 · Robert Kohn3,4 · Minh-Ngoc Tran4,6

Received: 1 May 2019 / Accepted: 27 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract We show how to speed up sequential Monte Carlo (SMC) for Bayesian inference in large data problems by data subsampling. SMC sequentially updates a cloud of particles through a sequence of distributions, beginning with a distribution that is easy to sample from such as the prior and ending with the posterior distribution. Each update of the particle cloud consists of three steps: reweighting, resampling, and moving. In the move step, each particle is moved using a Markov kernel; this is typically the most computationally expensive part, particularly when the dataset is large. It is crucial to have an efficient move step to ensure particle diversity. Our article makes two important contributions. First, in order to speed up the SMC computation, we use an approximately unbiased and efficient annealed likelihood estimator based on data subsampling. The subsampling approach is more memory efficient than the corresponding full data SMC, which is an advantage for parallel computation. Second, we use a Metropolis within Gibbs kernel with two conditional updates. A Hamiltonian Monte Carlo update makes distant moves for the model parameters, and a block pseudo-marginal proposal is used for the particles corresponding to the auxiliary variables for the data subsampling. We demonstrate both the usefulness and limitations of the methodology for estimating four generalized linear models and a generalized additive model with large datasets. Keywords Hamiltonian Monte Carlo · Large datasets · Likelihood annealing

1 Introduction The aim of Bayesian inference is to obtain the posterior distribution of unknown parameters, and in particular the posterior expectations of functions of the parameters. This is usually done by estimating the expectation using samples from the posterior distribution. Exact approaches such as Markov Chain Monte Carlo (MCMC) (Brooks et al. 2011) have mostly been used for sampling from complex posterior distributions. However, MCMC methods have some notable

B

David Gunawan [email protected]

1

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia

2

School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, Australia

3

School of Economics, UNSW Business School, University of New South Wales, Kensington, Australia

4

ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), Parkville, Australia

5

Research Division, Sveriges Riksbank, Stockholm, Sweden

6

Discipline of Business Analytics, University of Sydney, Sydney, Australia

drawbacks and limitations. One drawback, often overlooked by practitioners when fitting complex models, is the failure to converge caused by poorly mixing chains. While Hamiltonian Monte Carlo (Neal 2011, HMC