Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?
- PDF / 392,363 Bytes
- 20 Pages / 439.37 x 666.142 pts Page_size
- 39 Downloads / 157 Views
Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology? Deborah G. Mayo 1 # Springer Nature B.V. 2020
Abstract The crisis of replication has led many to blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down actually serves to vindicate statistical significance tests. While statistical significance tests, used correctly, serve to bound the probabilities of erroneous interpretations of data, this error control is nullified by data-dredging, multiple testing, and other biasing selection effects. Arguments claiming to vitiate statistical significance tests attack straw person variants of tests that commit well-known fallacies and misinterpretations. There is a tension between popular calls for preregistration – arguably, one of the most promising ways to boost replication – and accounts that downplay error probabilities: Bayes Factors, Bayesian posteriors, likelihood ratios. By underscoring the importance of error control for well testedness, the replication crisis points to reformulating tests so as to avoid fallacies and report the extent of discrepancies that are and are not indicated with severity. Keywords Crisis of replication . Data dredging . Preregistration . Severe testing . Statistical
significance
1 Introduction As new evidence piles up showing lack of replication of statistical results, there has been introspection among statistical practitioners. I focus on the statistical replication crisis in psychology. The statistical methods most used are the ones most criticized: statistical significance tests. The problem of spurious significant results is considered serious enough for the American Statistical Association (ASA) to set out principles for avoiding misinterpretation of significance tests.
* Deborah G. Mayo [email protected]
1
Virginia Tech, Blacksburg, VA, USA
Mayo D.G.
The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. …. much confusion and even doubt about the validity of science is arising. (Wasserstein and Lazar 2016, 129). Many blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate with predesignated hypotheses and tighter controls. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down gives new understanding and appreciation for the role of statistical significance tests. It vindicates them. Statistical significance tests are a part of a rich conglomeration of tools “for systematically appraising and bounding the probabilities … of seriously misleading interpretations of data” (Birnbaum 1970, 1033). These are a method’s error probabilities. Accounts where probability is used to assess and control a method’s error probabilities I call error statistical. Replication researchers have learned how this error contr
Data Loading...