Enterprise software teams face a widening gap between code supply and reviewer bandwidth as AI-assisted coding tools accelerate output. At Meta, lines of code per human-landed diff grew 105.9% year over year, and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth, according to a paper posted on arXiv. The share of diffs receiving timely review has declined, exposing a bottleneck that risk-aware automation can address.
Meta's solution is RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based automated code review, and deterministic validation before landing qualifying changes. The system evaluates diffs across diverse organizations within Meta.
RADAR's Impact Metrics
The research team reported that RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The key outcomes are summarized in the table below.
| Metric | RADAR-Reviewed | Non-RADAR | Improvement Factor |
|---|---|---|---|
| Revert rate | - | - | 1/3 of non-RADAR |
| Production incident rate | - | - | 1/50 of non-RADAR |
| Median time to close | - | - | Reduced by >330% |
| Median diff review wall time | - | - | Reduced by 35% |
Risk Calibration and Trade-offs
A central question the paper explores is how tuning the risk threshold affects automation yield versus safety. The authors found that moving the Diff Risk Score threshold from the 25th to the 50th percentile bumped the approve rate to 60.31%, indicating a significant yield increase. Critically, even at this relaxed threshold, revert rates remained one-third of those for non-RADAR diffs, and production incidents were 1/50 of non-RADAR diffs.
Efficiency Gains for AI-Generated Changes
RADAR specifically targets review bottlenecks created by AI-driven code growth. The system reduces median time to close by over 330% and median diff review wall time by 35%, according to the paper. These gains are achieved through layered automation that does not compromise production safety.
For enterprise technology leaders evaluating automation strategies, RADAR demonstrates that risk-stratified, multi-stage review can materially reduce latency while maintaining or improving quality. The approach—using static heuristics, a learned risk score, and LLM-based review—is organization-agnostic in principle, though the paper focuses on Meta's implementation.
Implications for Enterprise Software Delivery
While RADAR is purpose-built for code review, the underlying methodology applies to any domain where human review is a bottleneck and changes can be classified by risk. The revert rate reduction (1/3 of baseline) and incident rate reduction (1/50 of baseline) suggest that automation, when calibrated correctly, can outperform manual review for low-risk changes. Companies scaling AI-generated contributions in internal or customer-facing systems can adopt similar risk stratification to preserve velocity without sacrificing reliability.