Properties of the stochastic approximation EM algorithm with mini-batch sampling

  • PDF / 1,196,626 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 85 Downloads / 191 Views

DOWNLOAD

REPORT


Properties of the stochastic approximation EM algorithm with mini-batch sampling Estelle Kuhn1 · Catherine Matias2 · Tabea Rebafka2 Received: 18 December 2019 / Accepted: 18 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation– Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given. Keywords EM algorithm · Mini-batch sampling · Stochastic approximation · Monte Carlo Markov chain Mathematics Subject Classification 65C60 · 62F12

1 Introduction On very large datasets the computing time of the classical Expectation–Maximization (EM) algorithm (Dempster et al. 1977) as well as its variants such as Monte Carlo EM, Stochastic Approximation EM, Monte Carlo Markov ChainSAEM and others can be very long, since all data points are visited in every iteration. To circumvent this problem, a number of EM-type algorithms have been proposed, namely various mini-batch (Neal and Hinton 1999; Liang and Klein 2009; Karimi et al. 2018; Nguyen et al. 2020) and online (Titterington 1984; Lange 1995; Cappé and Moulines 2009; Cappé 2011) versions of the EM algorithm. They all consist in using only a part of the observations during one iteration in

B

Tabea Rebafka [email protected] Estelle Kuhn [email protected] Catherine Matias [email protected]

1

MaIAGE, INRAE, Université Paris-Saclay, Jouy-en-Josas, France

2

Laboratoire de Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, Université de Paris, CNRS, Paris, France

order to shorten computing time and accelerate convergence. While online algorithms process a single observation per iteration handled in the order of arrival, mini-batch algorithms use larger, randomly chosen subsets of observations. The size of these subsets of data is generally called the mini-batch size. Choosing large mini-batch sizes entails long computing times, while very small mini-batch sizes as well as online algorithms may result in a loss of accuracy of the algorithm. This raises the question about the optimal mini-batch size that would achieve a compromise between accuracy and computing time. However this issue is generally overlooked. In this article, we propose a mini-batch version of the MCMC–SAEM algorithm (Delyon et al. 1999; Kuhn and Lavielle 2004). The original MCMC–SAEM algorithm i