A Note on Likelihood Ratio Tests for Models with Latent Variables

  • PDF / 483,989 Bytes
  • 17 Pages / 547.087 x 737.008 pts Page_size
  • 67 Downloads / 200 Views

DOWNLOAD

REPORT


THEORY AND METHODS A NOTE ON LIKELIHOOD RATIO TESTS FOR MODELS WITH LATENT VARIABLES

Yunxiao Chen

and Irini Moustaki

LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE

Haoran Zhang FUDAN UNIVERSITY

The likelihood ratio test (LRT) is widely used for comparing the relative fit of nested latent variable models. Following Wilks’ theorem, the LRT is conducted by comparing the LRT statistic with its asymptotic distribution under the restricted model, a χ 2 distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models under comparison. For models with latent variables such as factor analysis, structural equation models and random effects models, however, it is often found that the χ 2 approximation does not hold. In this note, we show how the regularity conditions of Wilks’ theorem may be violated using three examples of models with latent variables. In addition, a more general theory for LRT is given that provides the correct asymptotic theory for these LRTs. This general theory was first established in Chernoff (J R Stat Soc Ser B (Methodol) 45:404–413, 1954) and discussed in both van der Vaart (Asymptotic statistics, Cambridge, Cambridge University Press, 2000) and Drton (Ann Stat 37:979–1012, 2009), but it does not seem to have received enough attention. We illustrate this general theory with the three examples. Key words: Wilks’ theorem, χ 2 distribution, latent variable models, random effects models, dimensionality, tangent cone.

1. Introduction 1.1. Literature on Likelihood Ratio Test The likelihood ratio test (LRT) is one of the most popular methods for comparing nested models. When comparing two nested models that satisfy certain regularity conditions, the pvalue of an LRT is obtained by comparing the LRT statistic with a χ 2 distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models. This reference distribution is suggested by the asymptotic theory of LRT that is known as Wilks’ theorem (Wilks 1938). However, for the statistical inference of models with latent variables (e.g., factor analysis, item factor analysis for categorical data, structural equation models, random effects models, finite mixture models), it is often found that the χ 2 approximation suggested by Wilks’ theorem does not hold. There are various published studies showing that the LRT is not valid under certain violations/conditions (e.g., small sample size, wrong model under the alternative hypothesis, large number of items, non-normally distributed variables, unique variances equal to zero, lack of identifiability), leading to over-factoring and over-rejections; see, e.g., Hakstian et al. (1982), Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-02009735-0) contains supplementary material, which is available to authorized users. Correspondence should be made to Yunxiao Chen, Department of Statistics, London School of Economics and Political Science, London, UK. Ema