Robust high dimensional expectation maximization algorithm via trimmed hard thresholding

PDF / 3,241,157 Bytes
29 Pages / 439.37 x 666.142 pts Page_size
73 Downloads / 244 Views

Robust high dimensional expectation maximization algorithm via trimmed hard thresholding Di Wang1 · Xiangyu Guo1 · Shi Li1 · Jinhui Xu1 Received: 16 April 2020 / Revised: 21 July 2020 / Accepted: 17 October 2020 © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2020

Abstract In this paper, we study the problem of estimating latent variable models with arbitrarily corrupted samples in high dimensional space (i.e., d ≫ n ) where the underlying parameter is assumed to be sparse. Specifically, we propose a method called Trimmed (Gradient) Expectation Maximization which adds a trimming gradients step and a hard thresholding step to the Expectation step (E-step) and the Maximization step (M-step), respectively. We show that under some mild assumptions and with an appropriate initialization, the algorithm is corruption-proofing and converges to the (near) optimal statistical � � rate geometrĩ cally when the fraction of the corrupted samples 𝜖 is bounded by O √1 . Moreover, we n

apply our general framework to three canonical models: mixture of Gaussians, mixture of regressions and linear regression with missing covariates. Our theory is supported by thorough numerical results. Keywords Robust statistics · High dimensional statistics · Gaussian mixture model · Expectation maximixation · Iterative hard thresholding

D. Wang and X. Guo have contributed equally. Editors: Kee-Eung Kim, Vineeth N. Balasubramanian. * Jinhui Xu [email protected] Di Wang [email protected] Xiangyu Guo [email protected] Shi Li [email protected] 1

Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA

13

Vol.:(0123456789)

Machine Learning

1 Introduction As one of the most popular techniques for estimating the maximum likelihood of mixture models or incomplete data problems, Expectation Maximization (EM) algorithm has been widely applied to many areas such as genomics (Laird 2010), finance (Faria and Gonçalves 2013), and crowdsourcing (Dawid and Skene 1979). Although EM algorithm is wellknown to converge to an empirically good local estimator (Wu et al. 1983), finite sample statistical guarantees for its performance have not been established until recent studies (Balakrishnan et al. 2017b; Zhu et al. 2017; Wang et al. 2015; Yi and Caramanis 2015). Specifically, the first local convergence theory and finite sample statistical rate of convergence for the classical EM and its gradient ascent variant (gradient EM) were established in Balakrishnan et al. (2017b). Later, Wang et al. (2015) extended the classical EM and gradient EM algorithms to the high dimensional sparse setting, and the key idea in their methods is an additional truncation step after the M-step, which can exploit the intrinsic sparse structure of the high dimensional latent variable models. Later on, Yi and Caramanis (2015) also studied the high dimensional sparse EM algorithm and proposed a method which uses a regularized M-estimator in the M-step. Recen

Data Loading...

Robust high dimensional expectation maximization algorithm via trimmed hard thresholding

Recommend Documents

Decentralized Expectation Maximization Algorithm

Recovery Algorithm for Compressive Image Sensing with Adaptive Hard Thresholding

Outlier Robust Geodesic K-means Algorithm for High Dimensional Data

Probability via Expectation

Probability via Expectation

Expectation-Maximization Method for EEG-Based Continuous Cursor Control

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

Dental radiography segmentation using expectation-maximization clustering and grasshopper optimizer

Channel Pruning via Optimal Thresholding

Robust Revenue Maximization Under Minimal Statistical Information

A new conjugate gradient hard thresholding pursuit algorithm for sparse signal recovery

Robust Two-Dimensional Linear Discriminant Analysis via Information Divergence