Equivalence between adaptive Lasso and generalized ridge estimators in linear regression with orthogonal explanatory var
- PDF / 324,390 Bytes
- 16 Pages / 439.37 x 666.142 pts Page_size
- 28 Downloads / 216 Views
Equivalence between adaptive Lasso and generalized ridge estimators in linear regression with orthogonal explanatory variables after optimizing regularization parameters Mineaki Ohishi1 · Hirokazu Yanagihara1 · Shuichi Kawano2 Received: 14 February 2019 / Revised: 19 August 2019 © The Institute of Statistical Mathematics, Tokyo 2019
Abstract In this paper, we deal with a penalized least-squares (PLS) method for a linear regression model with orthogonal explanatory variables. The used penalties are an adaptive Lasso (AL)-type 1 penalty (AL penalty) and a generalized ridge (GR)-type 2 penalty (GR penalty). Since the estimators obtained by minimizing the PLS methods strongly depend on the regularization parameters, we optimize them by a model selection criterion (MSC) minimization method. The estimators based on the AL penalty and the GR penalty have different properties, and it is universally recognized that these are completely different estimators. However, in this paper, we show an interesting result that the two estimators are exactly equal when the explanatory variables are orthogonal after optimizing the regularization parameters by the MSC minimization method. Keywords Adaptive Lasso · C p criterion · GCV criterion · Generalized ridge regression · GIC · Linear regression · Model selection criterion · Optimization problem · Regularization parameters · Sparsity
1 Introduction We deal with a linear regression model with an n-dimensional vector of response variables y = (y1 , . . . , yn ) and an n ×k matrix of nonstochastic explanatory variables
The second author was partially supported by the Ministry of Education, Science, Sports, and Culture, and a Grant-in-Aid for Scientific Research (C), #18K03415, 2018–2021, and the last author was supported by JSPS KAKENHI Grant Number JP19K11854.
B
Mineaki Ohishi [email protected]
1
Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8526, Japan
2
Department of Computer and Network Engineering, Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan
123
M. Ohishi et al.
X, where n is the sample size and k is the number of explanatory variables. Here, without loss of generality, we assume that y and X are centralized, i.e., y 1n = 0 and X 1n = 0k , where 1n is an n-dimensional vector of ones and 0k is a k-dimensional vector of zeros. Moreover, in this paper, we particularly assume that the following equations hold: rank(X) = k < n − 1,
X X = D = diag(d1 , . . . , dk ), d1 ≥ · · · ≥ dk > 0.
The relation X X = D indicates that the explanatory variables are orthogonal. Examples of models with orthogonal explanatory variables include those of principal component analysis (Massy 1965; Jolliffe 1982; Yanagihara 2018), generalized ridge (GR) regression (Hoerl and Kennard 1970), and smoothing using orthogonal basis functions (Yanagihara 2012; Hagiwara 2017). The least-squares (LS) method
Data Loading...