Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?

PDF / 392,363 Bytes
20 Pages / 439.37 x 666.142 pts Page_size
39 Downloads / 188 Views

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology? Deborah G. Mayo 1 # Springer Nature B.V. 2020

Abstract The crisis of replication has led many to blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down actually serves to vindicate statistical significance tests. While statistical significance tests, used correctly, serve to bound the probabilities of erroneous interpretations of data, this error control is nullified by data-dredging, multiple testing, and other biasing selection effects. Arguments claiming to vitiate statistical significance tests attack straw person variants of tests that commit well-known fallacies and misinterpretations. There is a tension between popular calls for preregistration – arguably, one of the most promising ways to boost replication – and accounts that downplay error probabilities: Bayes Factors, Bayesian posteriors, likelihood ratios. By underscoring the importance of error control for well testedness, the replication crisis points to reformulating tests so as to avoid fallacies and report the extent of discrepancies that are and are not indicated with severity. Keywords Crisis of replication . Data dredging . Preregistration . Severe testing . Statistical

significance

1 Introduction As new evidence piles up showing lack of replication of statistical results, there has been introspection among statistical practitioners. I focus on the statistical replication crisis in psychology. The statistical methods most used are the ones most criticized: statistical significance tests. The problem of spurious significant results is considered serious enough for the American Statistical Association (ASA) to set out principles for avoiding misinterpretation of significance tests.

* Deborah G. Mayo [email protected]

1

Virginia Tech, Blacksburg, VA, USA

Mayo D.G.

The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. …. much confusion and even doubt about the validity of science is arising. (Wasserstein and Lazar 2016, 129). Many blame statistical significance tests for making it too easy to find impressive looking effects that do not replicate with predesignated hypotheses and tighter controls. However, the very fact it becomes difficult to replicate effects when features of the tests are tied down gives new understanding and appreciation for the role of statistical significance tests. It vindicates them. Statistical significance tests are a part of a rich conglomeration of tools “for systematically appraising and bounding the probabilities … of seriously misleading interpretations of data” (Birnbaum 1970, 1033). These are a method’s error probabilities. Accounts where probability is used to assess and control a method’s error probabilities I call error statistical. Replication researchers have learned how this error contr

Data Loading...

Significance Tests: Vitiated or Vindicated by the Replication Crisis in Psychology?

Recommend Documents

Addressing the theory crisis in psychology

Hollywood in Crisis or: The Collapse of the Real

Double trouble? The communication dimension of the reproducibility crisis in experimental psychology and neuroscience

Euclid Vindicated from Every Blemish Edited and Annotated by Vincenz

crisis or cure: no hope for germany?

Tests of Significance for Structural Correlations in the Linear Model of Coregionalization

Kalman-Based Virtual Sensing for Improvement of Service Response Replication in Environmental Tests

Replication

Replication

Psychology in the Indian Tradition

Rediscovering the History of Psychology Essays Inspired by the Work

Tests for Equivalence or Noninferiority Between Two Proportions