Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion

  • PDF / 818,845 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 8 Downloads / 145 Views

DOWNLOAD

REPORT


Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion Marco Bertoletti1 · Nial Friel2 · Riccardo Rastelli2

Received: 14 November 2014 / Accepted: 3 May 2015 © Sapienza Università di Roma 2015

Abstract The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution here is to illustrate that through the use of conjugate priors one can derive an exact expression for ICL and so avoiding any approximation. Moreover, we illustrate how one can find both the number of clusters and the best allocation of observations in one algorithmic framework. The performance of our algorithm is presented on several simulated and real examples. Keywords Integrated completed likelihood · Finite mixture models · Model-based clustering · Greedy search

1 Introduction Finite mixture models are a widely used approach for parametric cluster analysis. Choosing the number of components in the mixture model, usually viewed as a model choice problem, is a crucial issue. In a Bayesian framework, model choice can be dealt with in a Markov chain Monte Carlo framework where the number of components is estimated simultaneously with model parameters using the reversible jump algorithm of [9], extended to the context of finite mixtures by [17]. An alternative approach has been introduced in [16], where authors propose to integrate out model parameters, in order to achieve better estimation and more efficient

B

Riccardo Rastelli [email protected]

1

Department of Statistical Sciences, University of Bologna, Bologna, Italy

2

Insight: Centre for Data Analytics and School of Mathematical Sciences, University College Dublin, Dublin, Ireland

123

M. Bertoletti et al.

sampling. The resulting algorithm, called the allocation sampler, carries out inference on the allocations (cluster labels of the observations) and the number of components K in one framework. Similar approaches based on collapsing model parameters have been applied to different contexts, such as network models, as shown in [6,13,21,22]. A more standard approach to model choice relies instead on the maximization of the Bayesian information criterion (BIC), which has again a Bayesian derivation but can be used in a more objective and frequentist fashion. Throughout the paper we will denote the observed data with x. Each observation is allocated in one group and one only, and the socalled allocation variable z is categorical and takes values in {1, . . . , K }. Now, let θˆK be the estimated model parameters under the assumption of a model with K components, then the log model evidence log f (x|K ) is approxim