Randomization Tests

If the model assumptions for ANOVA do not hold, then the ANOVA F-test is not necessarily valid for testing the hypothesis of equal means. However, one can compute an ANOVA table and a F statistic; what is in doubt is whether the “F” ratio has a F distribu

  • PDF / 221,273 Bytes
  • 11 Pages / 439.37 x 666.14 pts Page_size
  • 76 Downloads / 218 Views

DOWNLOAD

REPORT


Randomization Tests

10.1 Introduction If the model assumptions for ANOVA do not hold, then the ANOVA F test is not necessarily valid for testing the hypothesis of equal means. However, one can compute an ANOVA table and an F statistic; what is in doubt is whether the “F ” ratio has an F distribution. A randomization test or permutation test provides a nonparametric approach based on the F statistic that does not require that the test statistic (F ) has an F distribution. Permutation tests were introduced in the basic inference chapter (Section 6.4.4) for testing the two-sample hypothesis of equal means. Just as ANOVA generalizes the two-sample t-test test for equal means to k ≥ 2 samples, the randomization tests discussed in this chapter generalize the two-sample permutation test discussed in Section 6.4.4. The main idea of a randomization (permutation) test is explained below in Section 10.3.

10.2 Exploring Data for One-way Analysis Prior to applying formal methods of statistical inference in any data analysis problem, it is essential to explore the data with descriptive and graphical summaries. If the research question is to determine whether groups differ in location, then the preliminary analysis helps to determine whether a one factor model is reasonable or whether there may be one or more other variables that should be included in the model. In the exploratory analysis one can check informally whether certain parametric model assumptions hold, which helps to identify the type of analysis (parametric or nonparametric) that is most suitable for the data at hand. Example 10.1 (The ‘Waste Run-up’ data). The ‘Waste Run-up’ data ([28, p. 86], [12]) is available at the DASL web site. The data are weekly percentJ. Albert and M. Rizzo, R by Example, Use R, DOI 10.1007/978-1-4614-1365-3__10, © Springer Science+Business Media, LLC 2012

243

244

10 Randomization Tests

age waste of cloth by five different supplier plants of Levi-Strauss, relative to cutting from a computer pattern. The question here is whether the five supplier plants differ in waste run-up. The data has been saved in a text file “wasterunup.txt”. The five columns correspond to the five different manufacturing plants. In the data file, the numbers of values in each column differ and the empty positions are filled with the symbol *. To convert this data into the one-way layout for comparison of groups, we first need to read the data into R. We use read.table to read the text file into R. The special “*” character is specified by setting na.strings="*" in the arguments to read.table. Then the data can be stacked using the stack function, which places all observations into a single column and creates an index variable labeled with column names. The result is exactly what we need except for the NA values, which can be removed by na.omit on the result. > waste = read.table( + file="wasterunup.txt", + header=TRUE, na.strings="*") > head(waste) #top of the data set

1 2 3 4 5 6

PT1 PT2 PT3 PT4 PT5 1.2 16.4 12.1 11.5 24.0 10.1 -6.0 9.7 10.2 -3.7 -2.0 -11.6 7.4 3.8 8.2 1.