MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.

iGEN Editorial

June 16, 2026

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks, such as booking travel or managing supply chain portals, by directly interacting with websites. However, this design exposes them to indirect prompt injection attacks hidden in untrusted web content, allowing adversaries to hijack agent behavior and violate user intent. According to a new paper on arXiv, existing evaluations rely on fixed attack templates or narrowly scoped scenarios, limiting their ability to capture realistic attacks.

To address this gap, researchers introduced MUZZLE, an automated agentic framework for red-teaming web agents against indirect prompt injection. MUZZLE uses the agent's own execution trajectories to automatically identify high-salience injection surfaces and adaptively generate context-aware malicious instructions targeting violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on observed execution and iteratively refines attacks using feedback from failed executions.

How MUZZLE Works

MUZZLE operates with minimal human intervention. It analyzes the agent's step-by-step actions (trajectories) to pinpoint where an attacker could inject malicious content that the agent would process. It then generates adversarial objectives that aim to break security properties. The framework adapts its approach based on whether previous attacks succeeded or failed, making it more effective than fixed templates.

Evaluation Results

The researchers evaluated MUZZLE across diverse web applications, user tasks, and agent configurations. The results showed that MUZZLE effectively discovered 44 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties across different LLMs and agent scaffolds.

Metric	Count
New attacks discovered	44
Web applications tested	4
Adversarial objectives	10
Cross-application injection attacks	3
Agent-tailored phishing scenarios	1

MUZZLE also identified novel attack strategies, including 3 cross-application prompt injection attacks and an agent-tailored phishing scenario. These findings demonstrate that current web agents are vulnerable to sophisticated, adaptive attacks that can leak sensitive data or perform unauthorized actions.

Implications for Enterprise Security

As CTOs and security leaders deploy LLM-based agents for automating business processes — from customer support to supply chain operations — the risk of indirect prompt injection becomes critical. MUZZLE provides a method for proactively testing agent security before deployment. The framework's ability to automate attack generation and adaptation means organizations can continuously assess their agents' resilience without relying on manual red-teaming.

The authors — Syros, Georgios; Rose, Evan; Grinstead, Brian; Kerschbaumer, Christoph; Robertson, William; Nita-Rotaru, Cristina; and Oprea, Alina — noted that MUZZLE effectively assesses agent security with minimal human intervention. While the paper focuses on general web agents, the same principles apply to specialized agents used in trade documentation, logistics portals, or customs systems, where prompt injection could compromise sensitive data or trigger unauthorized transactions.

Enterprise buyers should consider integrating automated red-teaming tools like MUZZLE into their AI security testing pipelines. The discovery of cross-application attacks highlights the need for isolating agent contexts and validating all external content before processing. As the paper shows, fixed defenses are insufficient against adaptive adversaries — security must evolve alongside agent behavior.

Sources:

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

How MUZZLE Works

Evaluation Results

Implications for Enterprise Security

Recommended Stories

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Jailbreaking Frontier AI Models Is Cheap and Easy, New Report Warns Enterprise Users

Prompt Injection Attacks Are Thwarting AI Hacking Agents with Context Bombing