From Overconfidence in Research to Over Certainty in Policy Analysis: Can We Escape the Cycle of Hype and Disappointment?

ANDREW GELMAN
Professor of Statistics and Political Science, Columbia University

This will not be yet another essay on the dangers of the digital panopticon. Nor will it chronicle the ways in which automatic and seemingly unbiased algorithms can carry hidden assumptions with potentially malign social consequences. These concerns are real, but here I want to focus on something different, which is the way that overconfidence in research claims can pollute policy analysis.

A fundamental principle of science is that your conclusions are only as good as your data. Unfortunately, certain standard paradigms in statistical analysis can obscure this point. In particular, conventional reasoning based on statistical significance can lead researchers to extract routine, apparent discoveries from noise—and also to gross exaggeration of effects, even in randomized, controlled studies.

There have been many well-publicized examples of such errors, for example a claim announcing the discovery of extra-sensory perception, and another that single women during a certain time of the month were 20 percentage points more likely to support Barack Obama—both published in leading journals of psychology [1-2]. In these and many other notorious examples, the claims were based on what might be considered overanalysis of limited data, meaning there is no reason to believe these results would replicate in new studies. In this way these examples are consistent with a larger “replication crisis” in the social and biological sciences, by which many theories that had been considered to be firmly established disappeared under attempts at replication, at which point it was possible to read the original papers with fresh eyes and recognize fatal flaws in the statistical analysis and data collection.

What does this have to do with policy? Surely we are not considering investing billions into ESP, designing political messages based on the menstrual cycle, or making any other major decisions based on headline-grabbing but speculative research.

No, but policy decisions can be made based on social science research; consider, for example, the role of quantitative studies in various debates in education regarding the efficacy of charter schools or funding for early-childhood intervention. Indeed, it is a tenet of the evidence-based policy movement that we should be using results from research studies, especially randomized controlled trials, to inform decision-making.

gelman.jpg

Here’s the problem. Results from an experiment or policy analysis are supposed to be published, and publicized, only if they are “statistically significant”; that is, the estimated effect, from the data at hand, must be large enough that it could not plausibly be explained by chance alone. However, given what psychologists Joe Simmons, Leif Nelson, and Uri Simonsohn have called “researcher degrees of freedom” in data coding and analysis [3], it is possible for researchers to find success with almost any data set, in the form of an apparently statistically significant effect. In addition to the concern that such findings can be pulled out of what are essentially pure noise (recall the ESP and ovulation-and-voting examples), another major problem is that published estimates, which are selected to be large enough to exceed the statistical-significance threshold, are because of this selection, overestimates of any underlying effects. In statistical terminology, these estimates suffer from selection bias [4].

Where researchers study the effects of social interventions and focus on statistically significant comparisons, their published results will, on average, overestimate effect sizes. And this happens even with honest, experienced, well-intentioned researchers using clean, randomized designs, as long as they are following standard practice and reporting statistically significant results [5].

Carry this into policy recommendations and you can get wildly optimistic expectations of success. For example, an educational intervention involving growth mindset was promoted as inducing an average 31 percentage point gain on test scores [6-7]; two years later, the designers of the program dialed down expectations and assessed possible effects as much lower. The point of the present essay is to emphasize that this sort of pattern of unrealistic expectation can arise without any attempts at manipulations by the researchers involved, but rather from selection biases arising from traditional statistical procedures.

It is a challenge of statistics and policymakers to escape this cycle of hype and disappointment, and to do this we must first recognize the problems in the seductive approach by which estimated effects that are “statistically significant” are taken to be true.

Endnotes

1. Daryl J. Bem, “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect,” Journal of Personality and Social Psychology 100, no. 3 (2011): 407–425, available at https://www.ncbi.nlm.nih.gov/pubmed/21280961.

2. Kristina M. Durante, Ashley Rae, and Vladas Griskevicius, “The Fluctuating Female Vote: Politics, Religion, and the Ovulatory Cycle,” Psychological Science 24, no. 6 (2013): 1007–1016, available at https://www.ncbi.nlm.nih.gov/pubmed/23613210.

3. Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science 22, no. 11 (2011): 1359–1366, available at https://www.ncbi.nlm.nih.gov/pubmed/22006061.

4. Andrew Gelman and John B. Carlin, “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors,” Perspectives on Psychological Science 9, no. 6 (2014): 641–651, available at https://www.ncbi.nlm.nih.gov/pubmed/26186114.

5. Andrew Gelman, “Honesty and Transparency Are Not Enough,” Chance 30, no. 1 (2017): 37–39, http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics14.pdf.

6. Richard A. Friedman, “Can You Get Smarter?,” The New York Times, October 23, 2015, https://www.nytimes.com/2015/10/25/opinion/sunday/can-you-get-smarter.html.

7. Carol Dweck, “Growth Mindset Interventions Yield Impressive Results,” The Conversation, June 26, 2018, https://theconversation.com/growth-mindset-interventions-yield-impressive-results-97423.

Dipayan Ghosh