An accelerated EM algorithm for mixture models with uncertainty for rating data
- PDF / 637,467 Bytes
- 24 Pages / 439.37 x 666.142 pts Page_size
- 61 Downloads / 188 Views
An accelerated EM algorithm for mixture models with uncertainty for rating data Rosaria Simone1 Received: 8 November 2018 / Accepted: 16 June 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract The paper is framed within the literature around Louis’ identity for the observed information matrix in incomplete data problems, with a focus on the implied acceleration of maximum likelihood estimation for mixture models. The goal is twofold: to obtain direct expressions for standard errors of parameters from the EM algorithm and to reduce the computational burden of the estimation procedure for a class of mixture models with uncertainty for rating variables. This achievement fosters the feasibility of best-subset variable selection, which is an advisable strategy to identify response patterns from regression models for all Mixtures of Experts systems. The discussion is supported by simulation experiments and a real case study. Keywords Louis’ Identity · Accelerated EM algorithm · cub Mixture models · Rating data · Standard errors
1 Motivation Since the availability of survey data and the interest in the information they convey is increasing, the need is advocated to take measurement errors and veracity checks into consideration. This issue is crucial when collecting ordinal scores as evaluations and preferences (Agresti 2010; Tutz 2012). Uncertainty for discrete outcomes conveying latent perceptions is meant as fuzziness of the choice, and thus it is related to the attitude of respondents, to the wording and usage of the measurement scale and, in general, to the modalities and circumstances of the response. In order to combine the expression of the true score with such nuisance effects, a mixture paradigm is an adequate framework to be applied. For rating data, the logic of cub models (Piccolo 2003; D’Elia and Piccolo 2005; Piccolo
B 1
Rosaria Simone [email protected] Department of Political Sciences, University of Naples Federico II, Via Leopoldo Rodinò, 22, 80138 Napoli, Italy
123
R. Simone
and Simone 2019a, b) adheres to this rationale by assuming a mixture of two discrete distributions, shaping feeling and uncertainty of the rating mechanism. The acronym cub stands for Combination of Uniform and Binomial, the distributions that specify the mixture components in the baseline specification. Given the model fit, the analysis of response profiles can be described in terms of conditional distributions once mixture parameters are linked to concomitant covariates (as in marketing studies, to identify market segments, or to summarize social and behavioral measurements like happiness or risk perception). Then, the search for the best model requires to perform variable selection for each model’s feature. Beyond regularization methods (Tibshirani 1996), an advisable strategy would be the bestsubset search in the covariate input space. However, statistical models should guarantee parsimonious interpretation of results with limited computational efforts: the task of running an exhaustive model
Data Loading...