Zero-Truncated and Zero-Inflated Models for Count Data

In this chapter, we discuss models for zero-truncated and zero-inflated count data. Zero truncated means the response variable cannot have a value of 0. A typical example from the medical literature is the duration patients are in hospital. For ecological

  • PDF / 1,063,975 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 109 Downloads / 241 Views

DOWNLOAD

REPORT


Zero-Truncated and Zero-Inflated Models for Count Data

11.1 Introduction In this chapter, we discuss models for zero-truncated and zero-inflated count data. Zero truncated means the response variable cannot have a value of 0. A typical example from the medical literature is the duration patients are in hospital. For ecological data, think of response variables like the time a whale is at the surface before re-submerging, counts of fin rays on fish (e.g. used for stock identification), dolphin group size, age of an animal in years or months, or the number of days that carcasses of road-killed animals (amphibians, owls, birds, snakes, carnivores, small mammals, etc.) remain on the road. These are all examples for which the response variable cannot take a value of 0. On their own, zero-truncated data are not necessarily a problem. It is the underlying assumption of Poisson and negative binomial distributions that may cause a problem as these distributions allow zeros within their range of possible values. If the mean is small, and the response variable does not contain zeros, then the estimated parameters and standard errors obtained by GLM may be biased. In Section 11.2, we introduce zero-truncated Poisson and zero-truncated negative binomial models as a solution for this problem. If the mean of the response variable is relatively large, ignoring the truncation problem, then applying a Poisson or negative binomial (NB) generalised linear model (GLM), is unlikely to cause a problem. In such cases, the estimated parameters and standard errors obtained by Poisson GLM and truncated Poisson GLM tend to be similar (the same holds for the negative binomial models). In ecological research, you need to search very hard to find zero-truncated data. Most count data are zero inflated. This means that the response variable contains more zeros than expected, based on the Poisson or negative binomial distribution. A simple histogram or frequency plot with a large spike at zero gives and early warning of possible zero inflation. This is illustrated by the graph in Fig. 11.1, which shows the numbers of parasites for the cod dataset that was used in Chapter 10 to illustrate logistic regression. In addition to presence and absence of parasites in cod, Hemmingsen et al. (2005) also counted the number of parasites, expressed as intensity. A.F. Zuur et al., Mixed Effects Models and Extensions in Ecology with R, Statistics for Biology and Health, DOI 10.1007/978-0-387-87458-6 11,  C Springer Science+Business Media, LLC 2009

261

Zero-Truncated and Zero-Inflated Models for Count Data

0

Frequencies

11

100 200 300 400 500 600

262

0 9 20 33 46

65

81

104 125 160 183 Observed intensity values

210

257

Fig. 11.1 Plot of the frequencies for the response variable Intensity from cod parasite data. There are 654 zeros, 108 ones, 71 twos, 52 threes, 44 fours, 31 fives, etc. Note the large numbers of zeros indicating zero inflation. R code to make this graph is presented in Section 11.4

In this chapter, four models are discussed that ca