Topic
web agents
Cybersecurity #cybersecurity#ai security
MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.
Jun 16, 2026 1 source
Artificial Intelligence #web agents#process-level evaluation
Process-Level Evaluation of Web Agents Reveals Hidden Performance Differences in AI Systems
Researchers introduce WebStep, a benchmark of 1,800 task instances that evaluates web agents at the process level using semantic state tracking. Key findings show that agents with similar success rates have divergent process metrics, with OpenAI CUA outperforming Qwen3.5 on commit actions but underperforming on filtering on the Housing website.
Jun 16, 2026 1 source