Greedy Variance Estimation for the LASSO

  • PDF / 673,113 Bytes
  • 22 Pages / 439.37 x 666.142 pts Page_size
  • 61 Downloads / 210 Views

DOWNLOAD

REPORT


Greedy Variance Estimation for the LASSO Christopher Kennedy1 · Rachel Ward1

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Abstract Recent results have proven the minimax optimality of LASSO and related algorithms for noisy linear regression. However, these results tend to rely on variance estimators that are inefficient or optimizations that are slower than LASSO itself. We propose an efficient estimator for the noise variance in high dimensional linear regression that is faster than LASSO, only requiring p matrix–vector multiplications. We prove this estimator is consistent with a good rate of convergence, under the condition that the design matrix satisfies the restricted isometry property (RIP). In practice, our estimator scales incredibly well into high dimensions, is highly parallelizable, and only incurs a modest bias. Keywords Estimate · Noise · Restricted isometry · Sparsity · Variance

1 Introduction The LASSO [23] is a classical algorithm for doing noisy linear regression in the case when the number of regression coefficients is larger than the number of response variables p > n. The analysis of LASSO has recently surged with much work on establishing oracle inequalities for 2 estimation over sparse vectors, and corresponding minimax rates. Typically, such results rely on knowledge of the variance of the noise, which is unknown in practice. The full extent of the literature on LASSO is immense and beyond the scope of this paper, but we point to a few important references on the oracle inequalities and corresponding minimax error rates (see [4,5,16,17,19,24, 27–31]). A good review of variance estimators for LASSO is given in [20], where variance estimation using cross-validated LASSO is highlighted as particularly strong in many

B

Rachel Ward [email protected] Christopher Kennedy [email protected]

1

Department of Mathematics, University of Texas at Austin, Austin, USA

123

Applied Mathematics & Optimization

sparsity regimes. This method typically uses 5- or 10-fold cross-validation to train the hyperparameters in LASSO and analysis relies on the restricted eigenvalue condition on the design matrix. The above work was later complemented by a theoretical analysis of a slightly modified variant of cross-validated LASSO in [7] (see also [9,13], e.g.). The method of moments (see [8]) is a reasonable alternative to cross-validated LASSO. It relies on the assumption that the design matrix is Gaussian and exploits statistical properties to formulate an estimator. It is consistent with a good rate of convergence [8], but the design matrix has to be Gaussian which is restrictive. We should also mention a variant of the LASSO—the square-root LASSO (see [3])—whose penalty level doesn’t depend on the variance of the noise. However, the resulting estimator is formulated as a conic programming problem which can be inefficient in practice and is beyond the scope of this work. 1.1 Our Contribution The main contributions of our a paper are the following: • We provide a fast varia