Artificial Intelligence #ai safety#benchmark
New OSGuard Benchmark Evaluates Safety of Computer-Use Agents for Enterprise AI Deployment
Researchers introduce OSGuard, a benchmark suite for evaluating safety in computer-use agents. It includes action-level guardrail decisions and a risk-augmented execution suite to detect unsafe completions that satisfy nominal task objectives. Early tests show current multimodal guardrails perform well on isolated action judgments but reveal gaps in end-to-end safety.
Jun 16, 2026 1 source