Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions

  • PDF / 4,234,715 Bytes
  • 32 Pages / 439.37 x 666.142 pts Page_size
  • 49 Downloads / 166 Views

DOWNLOAD

REPORT


Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions Sharon X. Lee1 · Tsung-I Lin2,3 · Geoffrey J. McLachlan4 Received: 25 October 2018 / Revised: 17 March 2020 / Accepted: 24 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Mixtures of factor analyzers (MFA) provide a powerful tool for modelling highdimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors were relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these models is typically limited to modelling skewness concentrated in a single direction. Here, we introduce a more flexible finite mixture of factor analyzers based on the class of scale mixtures of canonical fundamental skew normal (SMCFUSN) distributions. This very general class of skew distributions can capture various types of skewness and asymmetry in the data. In particular, the proposed mixtures of SMCFUSN factor analyzers (SMCFUSNFA) can simultaneously accommodate multiple directions of skewness. As such, it encapsulates many commonly used models as special and/or limiting cases, such as models of some versions of skew normal and skew t-factor analyzers, and skew hyperbolic factor analyzers. For illustration, we focus on the t-distribution member of the class of SMCFUSN distributions, leading to mixtures of canonical fundamental skew t-factor analyzers (CFUSTFA). Parameter estimation can be carried out by maximum likelihood via an EM-type algorithm. The usefulness and potential of the proposed model are demonstrated using four real datasets. Keywords Mixture models · Factor Analysis · Skew distributions · EM algorithm · Clustering Mathematics Subject Classification 62H30

B

Geoffrey J. McLachlan [email protected]

1

School of Mathematical Science, University of Adelaide, Adelaide, South Australia 5005, Australia

2

Institute of Statistics, National Chung Hsing University, Taichung, Taiwan

3

Department of Public Health, China Medical University, Taichung, Taiwan

4

School of Mathematics and Physics, University of Queensland, Saint Lucia 4072, Australia

123

S. X. Lee et al.

1 Introduction The factor analysis (FA) model and mixtures of factor analyzers (MFA) play a vital role in statistical data analysis, in particular, in cluster analysis, dimension reduction, and density estimation. Their usefulness was demonstrated in a wide range of applications in different fields such as bioinformatics (McLachlan et al. 2003), computer experiment (Zhoe and Mobasher 2006), pattern recognition (Yamamoto et al. 2005), social and psychological sciences (Wall et al. 2012), and environmental sciences (Maruotti et al. 2017). The traditional formulation of the MFA model assumes that the latent component factors and errors jointly follow a multivariate normal distribution. However, in applied problems, the data will not always follow the