GLM and GAM for Count Data
A generalised linear model (GLM) or a generalised additive model (GAM) consists of three steps: (i) the distribution of the response variable, (ii) the specification of the systematic component in terms of explanatory variables, and (iii) the link between
- PDF / 1,150,376 Bytes
- 35 Pages / 439.37 x 666.142 pts Page_size
- 50 Downloads / 203 Views
GLM and GAM for Count Data
9.1 Introduction A generalised linear model (GLM) or a generalised additive model (GAM) consists of three steps: (i) the distribution of the response variable, (ii) the specification of the systematic component in terms of explanatory variables, and (iii) the link between the mean of the response variable and the systematic part. In Chapter 8, we discussed several different distributions for the response variable: Normal, Poisson, negative binomial, geometric, gamma, Bernoulli, and binomial distributions. One of these distributions can be used for the first step mentioned above. In fact, later in Chapter 11, we see how you can also use a mixture of two distributions for the response variable; but in this chapter, we only work with one distribution at a time. We spent a lot of time looking at distributions in Chapter 8 because our experience teaching environmental scientists show that in general they are less familiar with some of these distributions, especially the negative binomial. Before reading this chapter, you should ensure that you are familiar with the material described in Chapter 8. In this chapter, we focus on count data and use the Poisson and negative binomial distributions. In the next chapter we concentrate on logistic regression using the binomial distribution. We also revisit count data in Chapter 11, where we look at data sets with lots of zeros or no zeros. Models for these types of data use a mixture of techniques discussed in this and the next chapter. Good references on GLM include McCullagh and Nelder (1998), Dobson (2002), and Agresti (2002). It is possible to dedicate an entire book to Poisson or logistic regression (see for examples: Hosmer and Lemeshow, 2000; Collet, 2003). Fox (2002), Ruppert et al. (2003), Wood (2006), and Keele (2008) are excellent GAM references. We start this chapter showing that the linear regression model is also a GLM. This is merely a pedagogical choice as it allows us to start with something familiar, and after all, the Gaussian linear regression can also be used for count data, even though it is not the best option. In Section 9.3, Poisson GLM is introduced using an artificial data set that we know the regression parameters for. It allows us to demonstrate what A.F. Zuur et al., Mixed Effects Models and Extensions in Ecology with R, Statistics for Biology and Health, DOI 10.1007/978-0-387-87458-6 9, C Springer Science+Business Media, LLC 2009
209
210
9
GLM and GAM for Count Data
the model is actually doing. In Section 9.4, we give the likelihood criterion and show how parameters can be estimated. In Sections 9.5, 9.6, 9.7, 9.8, and 9.9, we discuss Poisson GLM using a real data set and focus on overdispersion, model selection, and model validation. In Section 9.10, we present the negative binomial distribution and show how it can be used if there is overdispersion. Finally we look at GAM.
9.2 Gaussian Linear Regression as a GLM A GLM consists of three steps: 1. An assumption on the distribution of the response variable Yi . T
Data Loading...