Obtaining a threshold for the stewart index and its extension to ridge regression
- PDF / 2,112,914 Bytes
- 19 Pages / 439.37 x 666.142 pts Page_size
- 51 Downloads / 190 Views
Obtaining a threshold for the stewart index and its extension to ridge regression Ainara Rodríguez Sánchez1 Catalina García García2
· Román Salmerón Gómez2 ·
Received: 24 April 2020 / Accepted: 7 November 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract The linear regression model is widely applied to measure the relationship between a dependent variable and a set of independent variables. When the independent variables are related to each other, it is said that the model presents collinearity. If the relationship is between the intercept and at least one of the independent variables, the collinearity is nonessential, while if the relationship is between the independent variables (excluding the intercept), the collinearity is essential. The Stewart index allows the detection of both types of near multicollinearity. However, to the best of our knowledge, there are no established thresholds for this measure from which to consider that the multicollinearity is worrying. This is the main goal of this paper, which presents a Monte Carlo simulation to relate this measure to the condition number. An additional goal of this paper is to extend the Stewart index for its application after the estimation by ridge regression that is widely applied to estimate model with multicollinearity as an alternative to ordinary least squares (OLS). This extension could be also applied to determine the appropriate value for the ridge factor. Keywords Linear regression · Multicollinearity · Ridge regression · Stewart index
B
Ainara Rodríguez Sánchez [email protected] Román Salmerón Gómez [email protected] Catalina García García [email protected]
1
Department of Economic Theory and History, University of Granada, Granada, Spain
2
Department of Quantitative Methods for Economics and Business, University of Granada, Granada, Spain
123
A. R. Sánchez et al.
1 Introduction The linear regression model is applied to analyze the effect of a set of explanatory (independent) variables, (x1 , . . . , x p , p ≥ 1), on an explained (dependent) variable, y. The model is defined with n observations and p independent variables as follows: y = Xβ + u,
(1)
where u is a random disturbance (that is assumed to be spherical with variance σ 2 ), Xn× p is the matrix of the observations of the independent variables (x1 = (1, 1, . . . , 1)t ) and yn×1 is the vector of the observations of the dependent variable. When a linear relationship exists between the independent variables of the model (excluding the intercept), it is said that the model presents essential collinearity. If the relationship is between the intercept and one (or more) independent variables, the collinearity is said to be nonessential (Marquardt and Snee 1975). In any case, if the model presents any kind of collinearity, the estimation by ordinary least squares (OLS) may be unstable, among other possible consequences. Therefore, it is important to analyze the possible existence of collinearity using appropriate diagnostic measures. The variance inflation factor (VI
Data Loading...