Open Science Gains Ground: 10-Year AI Study Shows Sharp Rise in Code and Data Sharing

A decade-long analysis of 56,800 AI conference papers shows documentation practices improving dramatically, with code and data sharing nearly sixfold from 11% to 64%. Estimated reproducibility also rose from 28% to 64%, improvements that predated formal reproducibility checklists.

iGEN Editorial

June 16, 2026

Open Science Gains Ground: 10-Year AI Study Shows Sharp Rise in Code and Data Sharing

The reproducibility crisis in artificial intelligence research has prompted major conferences to adopt documentation standards, but a new analysis of 56,800 papers from 2014 to 2024 suggests that the field's improvement in sharing code and data predates and far exceeds the impact of these formal requirements. According to a study by Coakley, Snelleman, Hoos, and Gundersen, published on arXiv, the proportion of papers that share both code and data increased nearly sixfold over the decade, from 11% to 64%.

Methodology and Scope

The researchers assessed all published papers from five leading AI conferences over the past decade. They identified seven reproducibility variables, which were quality-assured, and used them to analyze the 56,800 publications. The study focused on documentation practices rather than directly testing reproducibility—the reproducibility estimates were inferred from documentation practices based on empirical reproducibility rates from a prior study.

Key Findings

Metric	2014	2024
Papers sharing both code and data	11%	64%
Estimated reproducibility	28%	64%

According to the study, improvements in documentation practices predate the introduction of reproducibility checklists, suggesting these changes reflect a broader movement toward open science rather than a direct response to formal requirements. The authors noted that in the period 2014 to 2024, documentation practices have improved substantially.

Implications for AI Adoption

For enterprise technology leaders evaluating AI systems, the trend toward increased code and data sharing enhances the ability to verify and reproduce research findings. While the study does not directly assess commercial AI products, the same open-science principles that drive increased reproducibility in academic research can reduce the risk of adopting opaque or non-reproducible models. The shift from 11% to 64% code and data sharing indicates that a majority of AI research now provides the building blocks needed for independent validation.

The broader open science movement, as evidenced by this analysis, is reshaping how AI research is conducted and disseminated. Enterprise buyers of AI solutions should consider whether vendors' claims are grounded in reproducible, openly documented work—a practice that this study shows is becoming the norm rather than the exception.

Sources:

Open Science Gains Ground: 10-Year AI Study Shows Sharp Rise in Code and Data Sharing

Methodology and Scope

Key Findings

Implications for AI Adoption

Recommended Stories

Process-Verified Reinforcement Learning for Theorem Proving via Lean: A New Path to AI Reliability

Yann LeCun's new AI startup AMI Labs raises $1bn to build flexible intelligence beyond LLMs

ScaleWoB Framework Synthesizes Realistic Environments to Evaluate GUI Agents at Scale

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models