According to a study published in the latest issue of Science, only 39 out of 100 psychology papers could be repeated with similar results.
A team led by Brian Nosek from the Center for Open Science (COS) spent four years reproducing the results from a hundred psychology papers published in a handful of leading journals. The team found that over half were unable to be reproduced at all, calling a lot of established research into question.
“For years there has been concern about the reproducibility of scientific findings, but little direct, systematic evidence. This project is the first of its kind and adds substantial evidence that the concerns are real and addressable,” said Nosek in a press release.
If a study is reproducible, it means that different groups of researchers performing the same experiment should get the same results. If they don’t, it could mean that the original study is flawed, or one of the research teams made a mistake, or there’s an unaccounted for variable that’s altering the results. Regardless, that so many studies can’t be reproduced means that their results probably shouldn’t be taken as fact without independent verification.
The problem might not even be with the original research at all, points out Elizabeth Gilbert, a Reproducibility Project team member. “A replication team must have a complete understanding of the methodology used for the original research, and shifts in the context or conditions of the research could be unrecognized but important for observing the result,” she said.
The issue isn’t necessarily that all of these psych studies are bad, it’s that we have no way of determining how many of them are good.
Nosek created the Reproducibility Project in order to test published studies in a variety of fields for replication. To do this, he recruited 270 researchers from around the globe, to work in teams to replicate certain articles published in 2008 in one of three top psychology journals: Psychological Science; Journal of Personality and Social Psychology; and Journal of Experimental Psychology: Learning, Memory and Cognition. Many of these teams collaborated with the authors of the original papers to ensure they were repeating the study as exactly as possible.
What they found is that, while 97% of the original studies reported significant results, only 36% of the replicated studies did. This means that a full two-thirds of the studies examined were not reproducible. And because the studies examined by the project weren’t selected based on any special criteria, it’s possible that most of the psychology studies published in the past few years aren’t reproducible either.
“The findings demonstrate that reproducing original results may be more difficult than is presently assumed, and interventions may be needed to improve reproducibility,” said Johanna Cohoon, one of the project coordinators with COS.
The Scope of the Problem
One possible solution would be for scientists to attempt to reproduce every published psychology study, in order to determine which ones are reliable. Unfortunately, there probably aren’t enough scientists to do that, and even if there were, most are uninterested.
“Scientists aim to contribute reliable knowledge, but also need to produce results that help them keep their job as a researcher,” said Nosek. “To thrive in science, researchers need to earn publications, and some kinds of results are easier to publish than others, particularly ones that are novel and show unexpected or exciting new directions.”
There’s plenty of encouragement for scientists who are doing original research, but almost none for scientists who are reproducing research someone else already did. There isn’t a whole lot of grant money available for people who check other scientists’ work, and it’s usually not very glamourous or exciting.
What this means is that many studies that look good on paper, but are unreliable for reasons that are impossible to detect, will slip through the cracks and into established knowledge because nobody’s willing to double-check them. This knowledge may influence everything from best therapy practice to national policy decisions.
This irreproducibility effect isn’t limited to psychology research, either. Previous efforts in the field of cancer biology found reproducibility rates as low as 11%. This problem isn’t limited to only a few disciplines; it’s pervaded all of science.
Changes to be Made
Nosek’s group has made clear that real changes need to be made, both in the way papers are published in journals and in the way other scientists and the public treat published research.
“We must stop treating single studies as unassailable authorities of the truth. Until a discovery has been thoroughly vetted and repeatedly observed, we should treat it with the measure of skepticism that scientific thinking requires,” say Elizabeth Gilbert and Nina Strohminger.
The Reproducibility Project also found that studies with stronger evidence backing their conclusions were more likely to be successfully replicated. For instance, studies often quantify the strength of their data by giving a “p-value,” which is the probability that any effect they found could have been gotten by chance. The lower a p-value, the more statistically significant the data. Nosek’s team found that studies with a lower p-value were more likely to be reproducible (although there are problems with p-values, too).
Some journals are also pushing for increased reproducibility. In 2014, The Association for Psychological Science (APS), which manages one of the journals used in Nosek’s study, started offering incentives to researchers who made their data and methods public. Ultimately, more public data and methods will make it easier for researchers, or groups like Nosek’s, to redo experiments to try and confirm their results.
None of these measures will solve the reproducibility problem completely, and we will never get anywhere close to a 100% reproducibility rate no matter how hard we try. But studies like Nosek’s are helping us recognize the scope of the problem, and initiatives like the APS’s offer a path to a solution. We’re not out of the woods yet, but it’s a start.