A review of deterministic approximate inference techniques for Bayesian machine learning

  • PDF / 296,120 Bytes
  • 12 Pages / 595.276 x 790.866 pts Page_size
  • 99 Downloads / 180 Views

DOWNLOAD

REPORT


INVITED REVIEW

A review of deterministic approximate inference techniques for Bayesian machine learning Shiliang Sun

Received: 16 June 2013 / Accepted: 18 June 2013 / Published online: 3 July 2013 Ó Springer-Verlag London 2013

Abstract A central task of Bayesian machine learning is to infer the posterior distribution of hidden random variables given observations and calculate expectations with respect to this distribution. However, this is often computationally intractable so that people have to seek approximation schemes. Deterministic approximate inference techniques are an alternative of the stochastic approximate inference methods based on numerical sampling, namely Monte Carlo techniques, and during the last 15 years, many advancements in this field have been made. This paper reviews typical deterministic approximate inference techniques, some of which are very recent and need further explorations. With an aim to promote research in deterministic approximate inference, we also attempt to identify open problems that may be helpful for future investigations in this field. Keywords Uncertainty  Probabilistic models  Bayesian machine learning  Posterior distribution  Deterministic approximate inference

1 Introduction Uncertainty is one of the key concepts in modern artificial intelligence and human decision making, which naturally arises in situations where insufficient information is provided or some determining factors are not observed [5, 35, 40]. Probabilistic models, which represent a probability

S. Sun (&) Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China e-mail: [email protected]; [email protected]

distribution over random variables, provide a principled and solid framework to resolve problems involving uncertainty. A probabilistic model usually consists of three components: deterministic parameters, hidden variables including latent variables and stochastic parameters, and observable variables, which jointly specify the probability distribution. The hidden and observable variables are both random variables, though the latter are usually clamped to their observed values. The distinction between latent variables and stochastic parameters lies in the fact that the number of latent variables grows with the size of the observed data set, while the number of stochastic parameters is fixed independently of that size [6]. The existence of hidden variables may correspond to missing data or may be imaginary to allow complicated and powerful distributions to be formed. Note that for easy visualization and investigation of properties, a probabilistic model is often represented as a graphical model. Determining a sole value or a distribution of values for parameters and latent variables in a probabilistic model from experience (i.e., data) is one of the core missions of machine learning. The determined value or distribution can then be used for decision making such as classification and regression. For this purpose, people have to reso