The reproducibility crisis in artificial intelligence research has prompted major conferences to adopt documentation standards, but a new analysis of 56,800 papers from 2014 to 2024 suggests that the field's improvement in sharing code and data predates and far exceeds the impact of these formal requirements. According to a study by Coakley, Snelleman, Hoos, and Gundersen, published on arXiv, the proportion of papers that share both code and data increased nearly sixfold over the decade, from 11% to 64%.
Methodology and Scope
The researchers assessed all published papers from five leading AI conferences over the past decade. They identified seven reproducibility variables, which were quality-assured, and used them to analyze the 56,800 publications. The study focused on documentation practices rather than directly testing reproducibility—the reproducibility estimates were inferred from documentation practices based on empirical reproducibility rates from a prior study.
Key Findings
| Metric | 2014 | 2024 |
|---|---|---|
| Papers sharing both code and data | 11% | 64% |
| Estimated reproducibility | 28% | 64% |
According to the study, improvements in documentation practices predate the introduction of reproducibility checklists, suggesting these changes reflect a broader movement toward open science rather than a direct response to formal requirements. The authors noted that in the period 2014 to 2024, documentation practices have improved substantially.
Implications for AI Adoption
For enterprise technology leaders evaluating AI systems, the trend toward increased code and data sharing enhances the ability to verify and reproduce research findings. While the study does not directly assess commercial AI products, the same open-science principles that drive increased reproducibility in academic research can reduce the risk of adopting opaque or non-reproducible models. The shift from 11% to 64% code and data sharing indicates that a majority of AI research now provides the building blocks needed for independent validation.
The broader open science movement, as evidenced by this analysis, is reshaping how AI research is conducted and disseminated. Enterprise buyers of AI solutions should consider whether vendors' claims are grounded in reproducible, openly documented work—a practice that this study shows is becoming the norm rather than the exception.