“There are two things you shouldn’t look at while it’s going on: sausages and econometric estimation. This is a sad and clearly unscientific situation we’re in. Very few people take data analysis seriously. Or, perhaps more accurately, very few people take other people’s data analysis seriously.”
This is a scathing critique of empirical research by economist Ed Riemer in his famous 1983 paper, “Let’s take the fraud out of econometrics.” At the time, he meant that researchers are sensitive to arbitrary choices made throughout the research process, so they know not to trust other researchers’ estimates too much. But in the decades since Riemer’s critique, an educated public has tended to take peer-reviewed research seriously.
This began to change with physician John Ioannidis’ 2005 hit article, “Why Most Published Research Results Are False.” Throughout the 2010s’ “replication crisis,” concerns rapidly grew, helped in part by the growth of social media. Psychology was hit first and hardest, starting with the 2011 article “False Positive Psychology.” But economics and other social sciences are no exception.
A core premise of science is that research should be reproducible. If a scientist creates an experiment to measure a physical constant, like the speed of light, and documents the experiment well, other scientists should be able to perform the same experiment and get the same results. If one lab’s results cannot be reproduced anywhere else, the results are probably not real, just like cold fusion.
You can’t expect to get the same accuracy outside of hard sciences like physics. Perhaps one trial found that a drug reduced heart attacks by 17%, and another trial found that it reduced heart attacks by 14%. But for research to usefully inform our actions, it needs to be at least somewhat reproducible. If a drug is found to be effective, but all subsequent trials show that it is ineffective, people probably shouldn’t take it.
For decades, social science research has produced the equivalent of research hyping drugs that turn out to be useless or harmful. When a team led by Brian Nosek attempted to replicate 100 experiments published in top psychology journals in 2015, fewer than half of the results were statistically significant. A Federal Reserve Board discussion paper released the same year showed similarly poor results for published economic papers.
If you can’t trust peer-reviewed research published in top journals, what can you trust? Since 2015, some of the common answers have been “nothing” or a combination of common sense and ideologically based prior beliefs. But the scientific reforms enacted in the wake of the replication crisis may finally be bearing fruit in the form of reproducible and reliable research.
The U.S. military has been one of many institutions that has relied on social science research to guide decision-making. When the replication crisis raised doubts about this research, they decided to take action. The Defense Advanced Projects Research Agency, famous for funding breakthroughs in hard technologies like the internet and self-driving cars, funded Brian Nosek and the Center for Open Science to conduct large-scale replication of research across the social sciences. The aim was to test how reliable this study was and to see if there were any commonalities in the types of studies that were found to be more reliable.
The results of this effort have just been published in a special issue of Nature magazine. Hundreds of researchers from across the social sciences (I was one of them) attempted to reproduce hundreds of claims from papers published in top social science journals. Overall, I could see things improving after a bad start. For example, most papers do not share the data or code that supposedly produced their results, but this is much more likely than it was in 2009, the beginning of our study period.
Figure 1: Data and code availability by publication year
Source: nature
By this measure, economics appears to be doing relatively well, as does political science. About half of articles share data or code, compared to less than one in ten articles in education. Economics similarly has relatively good “reproducibility”, with most articles passing this low standard. Reproducibility refers to whether other researchers will obtain exactly the same results when analyzing the exact same data set described in a published paper using the exact method of analysis described in the paper. Economics papers found exactly the same result 67% of the time, a higher rate than any other field studied.
Figure 2: Reproducibility by field
Source: nature
I call this a “low bar” because it just means that the original researchers documented what they did well enough for others to copy, not that what they found was correct (on the contrary, if they didn’t document it well enough for others to copy, that doesn’t necessarily mean they were wrong). How do we know if they were right?
Other papers in the Nature issue test how sensitive the results are to fine-tuning the analytical method. If there are several reasonable ways to analyze the data, did the original researchers just happen to choose (by chance or by careful selection) the only method that yielded a statistically significant result, or would the most reasonable methods reach more or less the same conclusion?
Here, most papers can be said to be “correct in direction.” Of the attempts to test robustness, 74% yielded statistically significant results in the same direction as the original results, but only 34% yielded effect sizes very close to the original results.
When attempting to reproduce the claims with new data sets (in addition to using new methods with existing data), only half of the time results were found to be statistically significant in the same direction as the original data, and the effect sizes found were less than half the size of the original data.
Overall, this suggests that published social science research typically exaggerates effect sizes and often claims effects that may not exist. This is far from ideal, but relying on research is much better than chance. For example, robustness tests found a significant effect in the opposite direction from the original paper only 2% of the time.
What does this mean for users of research? It’s always a good idea to trust the entire literature rather than a single article. In economics, the Journal of Economic Perspectives does a good job of organizing the field of research in a relatively accessible way.
A new simple rule of thumb inspired by the Nature paper: You can do worse than “cut the estimated effect size in half.” If a published paper says a college degree increases wages by 100%, chances are that the degree actually increases wages, but only by about 40-50%. In 2005, John Ioannidis stated that “most of the published research results are false.” By 2026, it appears to have improved to the point that “most published research results are exaggerated.”
(0 comments)
Source link
