Transform-Both-Sides Regression

Fitting multiple regression models by the method of least squares is one of the most commonly used methods in statistics. There are a number of challenges to the use of least squares, even when it is only used for estimation and not inference, including t

  • PDF / 311,258 Bytes
  • 10 Pages / 504.581 x 719.997 pts Page_size
  • 59 Downloads / 159 Views

DOWNLOAD

REPORT


Transform-Both-Sides Regression

16.1 Background Fitting multiple regression models by the method of least squares is one of the most commonly used methods in statistics. There are a number of challenges to the use of least squares, even when it is only used for estimation and not inference, including the following. 1. How should continuous predictors be transformed so as to get a good fit? 2. Is it better to transform the response variable? How does one find a good transformation that simplifies the right-hand side of the equation? 3. What if Y needs to be transformed non-monotonically (e.g., |Y − 100|) before it will have any correlation with X? When one is trying to draw an inference about population effects using confidence limits or hypothesis tests, the most common approach is to assume that the residuals have a normal distribution. This is equivalent to assuming that the conditional distribution of the response Y given the set of predictors X is normal with mean depending on X and variance that is (one hopes) a constant independent of X. The need for a distributional assumption to enable us to draw inferences creates a number of other challenges such as the following. 1. If for the untransformed original scale of the response Y the distribution of the residuals is not normal with constant spread, ordinary methods will not yield correct inferences (e.g., confidence intervals will not have the desired coverage probability and the intervals will need to be asymmetric). 2. Quite often there is a transformation of Y that will yield well-behaving residuals. How do you find this transformation? Can you find a transformation for the Xs at the same time?

© Springer International Publishing Switzerland 2015

F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 16

389

390

16 Transform-Both-Sides Regression

3. All classical statistical inferential methods assume that the full model was pre-specified, that is, the model was not modified after examining the data. How does one correct confidence limits, for example, for data-based model and transformation selection?

16.2 Generalized Additive Models Hastie and Tibshirani275 have developed generalized additive models (GAMs) for a variety of distributions for Y . There are semiparametric GAMs, but most GAMs for continuous Y assume that the conditional distribution of Y is from a specific distribution family. GAMs nicely estimate the transformation each continuous X requires so as to optimize a fitting criterion such as sum of squared errors or log likelihood, subject to the degrees of freedom the analyst desires to spend on each predictor. However, GAMs assume that Y has already been transformed to fit the specified distribution family. There is excellent software available for fitting a wide variety of GAMs, such as the R packages gam, mgcv, and robustgam.

16.3 Nonparametric Estimation of Y -Transformation When the model’s left-hand side also needs transformation, either to improve R2 or to achieve constant variance of the r