General Aspects of Fitting Regression Models

The ordinary multiple linear regression model is frequently used and has parameters that are easily interpreted. In this chapter we study a general class of regression models, those stated in terms of a weighted sum of a set of independent or predictor va

  • PDF / 463,245 Bytes
  • 32 Pages / 504.581 x 719.997 pts Page_size
  • 85 Downloads / 205 Views

DOWNLOAD

REPORT


General Aspects of Fitting Regression Models

2.1 Notation for Multivariable Regression Models The ordinary multiple linear regression model is frequently used and has parameters that are easily interpreted. In this chapter we study a general class of regression models, those stated in terms of a weighted sum of a set of independent or predictor variables. It is shown that after linearizing the model with respect to the predictor variables, the parameters in such regression models are also readily interpreted. Also, all the designs used in ordinary linear regression can be used in this general setting. These designs include analysis of variance (ANOVA) setups, interaction effects, and nonlinear effects. Besides describing and interpreting general regression models, this chapter also describes, in general terms, how the three types of assumptions of regression models can be examined. First we introduce notation for regression models. Let Y denote the response (dependent) variable, and let X = X1 , X2 , . . . , Xp denote a list or vector of predictor variables (also called covariables or independent, descriptor, or concomitant variables). These predictor variables are assumed to be constants for a given individual or subject from the population of interest. Let β = β0 , β1 , . . . , βp denote the list of regression coefficients (parameters). β0 is an optional intercept parameter, and β1 , . . . , βp are weights or regression coefficients corresponding to X1 , . . . , Xp . We use matrix or vector notation to describe a weighted sum of the Xs: Xβ = β0 + β1 X1 + . . . + βp Xp ,

(2.1)

where there is an implied X0 = 1. A regression model is stated in terms of a connection between the predictors X and the response Y . Let C(Y |X) denote a property of the distribution of Y given X (as a function of X). For example, C(Y |X) could be E(Y |X),

© Springer International Publishing Switzerland 2015

F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 2

13

14

2 General Aspects of Fitting Regression Models

the expected value or average of Y given X, or C(Y |X) could be the probability that Y = 1 given X (where Y = 0 or 1).

2.2 Model Formulations We define a regression function as a function that describes interesting properties of Y that may vary across individuals in the population. X describes the list of factors determining these properties. Stated mathematically, a general regression model is given by C(Y |X) = g(X).

(2.2)

We restrict our attention to models that, after a certain transformation, are linear in the unknown parameters, that is, models that involve X only through a weighted sum of all the Xs. The general linear regression model is given by C(Y |X) = g(Xβ).

(2.3)

For example, the ordinary linear regression model is C(Y |X) = E(Y |X) = Xβ,

(2.4)

and given X, Y has a normal distribution with mean Xβ and constant variance σ 2 . The binary logistic regression model129, 647 is C(Y |X) = Prob{Y = 1|X} = (1 + exp(−Xβ))−1 ,

(2.5)

where Y can take on the v