adversarial robustness

3 stories

Artificial Intelligence #llm safety#red-teaming

New Benchmark Reveals Critical Vulnerabilities in LLM Agents Used for Safety-Critical Systems

A new benchmark called NRT-Bench tests multi-turn red-teaming of LLM agents operating a simulated nuclear power plant. Adaptive attacks cause safety limit breaches in up to 12.1% of sessions, with vulnerabilities nearly disjoint across models.

Jun 20, 2026 1 source

GRAPE: New Training Method Boosts Adversarial Robustness with 21% Fewer Parameters

Technology

Artificial Intelligence #adversarial robustness#parameter-space evolution

GRAPE: New Training Method Boosts Adversarial Robustness with 21% Fewer Parameters

A new training framework called GRAPE (Guided Parameter-Space Evolution) improves adversarial robustness in neural networks by progressively exposing parameters, achieving 56.94% robust accuracy on CIFAR-10 with 21.4% fewer parameters than standard adversarial training, according to an arXiv paper.

Jun 16, 2026 1 source

New Benchmark ARB4WM Evaluates Adversarial Robustness of World Models for Safety-Critical Control

Technology

Artificial Intelligence #ai#adversarial robustness

New Benchmark ARB4WM Evaluates Adversarial Robustness of World Models for Safety-Critical Control

Researchers have introduced ARB4WM, a unified benchmark for evaluating adversarial robustness of world models used in continuous control systems. The framework tests attacks across policy, value, and latent-dynamics levels, revealing that targeting value estimation and latent representations can be as harmful as direct policy disruption. Early and frequent perturbations are particularly damaging, and input-level defenses offer limited recovery.

Jun 16, 2026 1 source