Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks, such as booking travel or managing supply chain portals, by directly interacting with websites. However, this design exposes them to indirect prompt injection attacks hidden in untrusted web content, allowing adversaries to hijack agent behavior and violate user intent. According to a new paper on arXiv, existing evaluations rely on fixed attack templates or narrowly scoped scenarios, limiting their ability to capture realistic attacks.
To address this gap, researchers introduced MUZZLE, an automated agentic framework for red-teaming web agents against indirect prompt injection. MUZZLE uses the agent's own execution trajectories to automatically identify high-salience injection surfaces and adaptively generate context-aware malicious instructions targeting violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on observed execution and iteratively refines attacks using feedback from failed executions.
How MUZZLE Works
MUZZLE operates with minimal human intervention. It analyzes the agent's step-by-step actions (trajectories) to pinpoint where an attacker could inject malicious content that the agent would process. It then generates adversarial objectives that aim to break security properties. The framework adapts its approach based on whether previous attacks succeeded or failed, making it more effective than fixed templates.
Evaluation Results
The researchers evaluated MUZZLE across diverse web applications, user tasks, and agent configurations. The results showed that MUZZLE effectively discovered 44 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties across different LLMs and agent scaffolds.
| Metric | Count |
|---|---|
| New attacks discovered | 44 |
| Web applications tested | 4 |
| Adversarial objectives | 10 |
| Cross-application injection attacks | 3 |
| Agent-tailored phishing scenarios | 1 |
MUZZLE also identified novel attack strategies, including 3 cross-application prompt injection attacks and an agent-tailored phishing scenario. These findings demonstrate that current web agents are vulnerable to sophisticated, adaptive attacks that can leak sensitive data or perform unauthorized actions.
Implications for Enterprise Security
As CTOs and security leaders deploy LLM-based agents for automating business processes — from customer support to supply chain operations — the risk of indirect prompt injection becomes critical. MUZZLE provides a method for proactively testing agent security before deployment. The framework's ability to automate attack generation and adaptation means organizations can continuously assess their agents' resilience without relying on manual red-teaming.
The authors — Syros, Georgios; Rose, Evan; Grinstead, Brian; Kerschbaumer, Christoph; Robertson, William; Nita-Rotaru, Cristina; and Oprea, Alina — noted that MUZZLE effectively assesses agent security with minimal human intervention. While the paper focuses on general web agents, the same principles apply to specialized agents used in trade documentation, logistics portals, or customs systems, where prompt injection could compromise sensitive data or trigger unauthorized transactions.
Enterprise buyers should consider integrating automated red-teaming tools like MUZZLE into their AI security testing pipelines. The discovery of cross-application attacks highlights the need for isolating agent contexts and validating all external content before processing. As the paper shows, fixed defenses are insufficient against adaptive adversaries — security must evolve alongside agent behavior.