Regression and Hierarchical Regression Models

Linear regression is the most commonly used statistical method for quantifying the relationships between variables and for using one or more variables to predict unobserved values of other variables. This chapter concerns Bayesian linear regression as wel

  • PDF / 421,520 Bytes
  • 27 Pages / 439.36 x 666.15 pts Page_size
  • 81 Downloads / 288 Views

DOWNLOAD

REPORT


Regression and Hierarchical Regression Models

Linear regression is one of the most commonly used methods in both classical and Bayesian statistics.

10.1 Review of Linear Regression Recall that in regression analysis, we have two or more variables that can be measured on the same subjects. We wish to use one or more of them—the predictor variables (also called independent variables or covariates)—to explain or predict a response variable (also called an outcome variable or a dependent variable). How we define which variable is the response and which are predictors depends on our research question. In linear regression, the response variable is quantitative. In simple linear regression, there is only one predictor variable, and the relationship between the response variable and the predictor is roughly linear. Typically, the notation Y is used for the response variable and X for a predictor, so that yi and xi denote the observed values of the response and the predictor for the ith subject in a dataset. The population regression equation with one covariate is Yi = β0 + β1 Xi + εi where β0 is the intercept (usually defined as the expected value of Y when X = 0) and β1 is the slope (the expected difference between two Y values whose corresponding X values differ by one unit). A crucial assumption underlying linear regression is that the expected values of the Y variable, when plotted against the values of the X variable, lie on a straight line. The ε s represent the random differences between individual observed Y values and their expected values on the regression line. The term for these random differences is errors, but with no implication that they are mistakes or wrong in any way. M.K. Cowles, Applied Bayesian Statistics: With R and OpenBUGS Examples, Springer Texts in Statistics 98, DOI 10.1007/978-1-4614-5696-4 10, © Springer Science+Business Media New York 2013

179

180

10 Regression and Hierarchical Regression Models

A second regression assumption, which is needed in frequentist analysis to calculate p-values and confidence intervals, is that the errors follow a normal distribution with zero mean. Thus, the three unknown parameters in simple linear regression are β0 , β1 , and the variance σ 2 of the normal distribution of the errors. The slope β1 is usually of greatest interest, since it captures the relationship between the two variables.

10.1.1 Centering the Covariate When all of the possible values of the covariate are of the same sign and lie far away from zero, the mathematical definition of the intercept may not make sense substantively. For example, suppose the population of interest is adult males, the covariate is height in inches, and the response variable is weight in pounds. Then, although the intercept is a perfectly valid mathematical construct and it is needed to make the line lie in the right place, the notion of an adult with height 0 inches is nonsensical. In such cases, a common practice is to center the covariate before using sample data to estimate the regression coefficients and var