Modelling Binary Outcomes

This chapter introduces regression, a powerful statistical technique applied to the problem of predicting health outcomes from data collected on a set of observed variables. We usually want to identify those variables that contribute to the outcome, eithe

  • PDF / 267,974 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 48 Downloads / 168 Views

DOWNLOAD

REPORT


Modelling Binary Outcomes Logistic Regression Gail M. Williams and Robert Ware

Abstract This chapter introduces regression, a powerful statistical technique applied to the problem of predicting health outcomes from data collected on a set of observed variables. We usually want to identify those variables that contribute to the outcome, either by increasing or decreasing risk, and to quantify these effects. A major task within this framework is to separate out those variables that are independently the most important, after controlling for other associated variables. We do this using a statistical model. We demonstrate the use of logistic regression, a particular form of regression when the health outcome of interest is binary; for example, dead/alive, recovered/not recovered.

The Generalized Linear Model (GLM) Statistical models are mathematical representations of data, that is, mathematical formulae that relate an outcome to its predictors. An outcome may be a mean (e.g. blood pressure), a risk (e.g. probability of a complication after surgery), or some other measure. The predictors (or explanatory variables) may be quantitative or categorical variables, and may be causes of the outcome (as in smoking causes heart failure) or markers of an outcome (more aggressive treatment may be a marker for more severe disease, which is associated with a poor health outcome). Generically, a fitted statistical model is represented by linear equations as shown in Fig. 10.1. ‘Outcome’ is the predicted value of the outcome for an individual who has a particular combination of values for predictors 1–3 etc. The coefficients are estimated from the data and are the quantities we are usually most interested in. The particular value of a predictor for an individual is multiplied by the corresponding coefficient to represent the contribution of that predictor to the outcome. So, in

G.M. Williams (*) • R. Ware School of Population Health, University of Queensland, Herston, QLD, Australia e-mail: [email protected] S.A.R. Doi and G.M. Williams (eds.), Methods of Clinical Epidemiology, Springer Series on Epidemiology and Public Health, DOI 10.1007/978-3-642-37131-8_10, © Springer-Verlag Berlin Heidelberg 2013

141

142

G.M. Williams and R. Ware

Fig. 10.1 A fitted GLM depicted mathematically

particular, if a coefficient for a predictor is estimated to be zero then that predictor makes no contribution to the outcome. The constant coefficient represents the predicted value of the outcome when the values of all of the predictors are zero. This may or may not be of interest or interpretable, because zero may not be in the observable range of the predictor. So the model predicts values of an outcome from each person’s set of values for predictors. This, of course, generally does not match that person’s actual observed value. The difference between the observed value and the predicted value is called the residual, or sometimes the error. The term error does not imply a mistake but rather represents the value of a random variable measu