Limitations of Linear Regression Applied on Ecological Data

This chapter revises the basic concepts of linear regression, shows how to apply linear regression in R, discusses model validation, and outlines the limitations of linear regression when applied to ecological data. Later chapters present methods to overc

  • PDF / 1,069,684 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 12 Downloads / 201 Views

DOWNLOAD

REPORT


Limitations of Linear Regression Applied on Ecological Data

This chapter revises the basic concepts of linear regression, shows how to apply linear regression in R, discusses model validation, and outlines the limitations of linear regression when applied to ecological data. Later chapters present methods to overcome some of these limitations; but as always before doing any complicated statistical analyses, we begin with a detailed data exploration. The key concepts to consider at this stage are outliers, collinearity, and the type of relationships between the variables. Failure to apply this initial data exploration may result in an inappropriate analysis forcing you to reanalyse your data and rewrite your paper, thesis, or report. We assume that the reader is ‘reasonably’ familiar with data exploration and linear regression techniques. This book is a follow-up to Analysing Ecological Data by Zuur et al. (2007), which discusses a wide range of exploration and analytical tools (including linear regression and its extensions), together with several related case study chapters. Other useful, non-mathematical textbooks containing regression chapters include Chambers and Hastie (1992), Fox (2002), Maindonald and Braun (2003), Venables and Ripley (2002), Dalgaard (2002), Faraway (2005), Verzani (2005) and Crawley (2002, 2005). At a considerable higher mathematical level, Ruppert et al. (2003) and Wood (2006) are excellent references for linear regression and extensions. All these books discuss linear regression and show how to apply it in R. Other good, but not based on R, textbooks include Montgomery and Peck (1992), Draper and Smith (1998) and Quinn and Keough (2002). Any of the above mentioned texts using R can be also used to learn R, but we highly recommend the book from Dalgaard (2002) or for a slightly different approach, Crawley (2005). However, even if you are completely unfamiliar with R, you should still be able to pick up the essentials from this book and ‘learn it as you go along’. It is not that difficult and, once exposed to R, you will never use anything else. Although various linear regression examples are given in this chapter, a complete example, including all R code and aspects like interaction, model selection and model validation steps, is given in Appendix A.

A.F. Zuur et al., Mixed Effects Models and Extensions in Ecology with R, Statistics for Biology and Health, DOI 10.1007/978-0-387-87458-6 2,  C Springer Science+Business Media, LLC 2009

11

12

2

Limitations of Linear Regression Applied on Ecological Data

2.1 Data Exploration 2.1.1 Cleveland Dotplots The first step in any data analysis is the data exploration. An important aspect in this step is identifying outliers (we discuss these later) and useful tools for this are boxplots and/or Cleveland dotplots (Cleveland, 1993). As an example of data exploration, we start with data used in Ieno et al. (2006). To identify the effect of species density on nutrient generation in the marine benthos, they applied a two-way ANOVA with nutrient con