Artificial Intelligence #technology#ai
DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents
Researchers present DualGauge, an automated framework for jointly evaluating correctness and security of code generated by LLMs from natural-language specifications. A benchmark of 307 tasks across three languages shows that even the strongest models achieve under 15% joint security-functionality success, while factors like scale and instruction tuning do not reliably improve outcomes. Three leading agentic coding systems also show no advantage over direct generation.
Jun 16, 2026 1 source