iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices The Robot Vacuums Cleaning My Three-Story Home for Me New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress Everllence Lands First Order for Next-Gen Methane Dual-Fuel Engine on Car Carriers How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability GMN4AD: New Graph Matching Network Boosts Alzheimer's Diagnosis Accuracy Using Multi-Center MRI Data Adaptive Memory Crystallization: New AI Architecture Slashes Forgetting by 80% While Boosting Knowledge Transfer by 43% RaBiT: Residual-Aware Binarization Training for Accurate and Efficient Large Language Models Modality-Aware Novelty Detection Framework MAND Improves Open-World Egocentric Activity Recognition Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling Gated QKAN-FWP: Quantum-Inspired Sequence Learning Achieves Parameter Efficiency on NISQ Devices The Robot Vacuums Cleaning My Three-Story Home for Me New Framework TRACED Evaluates LLM Reasoning Using Geometric Stability and Progress Everllence Lands First Order for Next-Gen Methane Dual-Fuel Engine on Car Carriers How Scale Design Impacts LLM Metacognition and Enterprise AI Reliability GMN4AD: New Graph Matching Network Boosts Alzheimer's Diagnosis Accuracy Using Multi-Center MRI Data Adaptive Memory Crystallization: New AI Architecture Slashes Forgetting by 80% While Boosting Knowledge Transfer by 43% RaBiT: Residual-Aware Binarization Training for Accurate and Efficient Large Language Models
Home ›› Technology ›› Ai ›› Llms ›› MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.

iG
iGEN Editorial
June 16, 2026
MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks, such as booking travel or managing supply chain portals, by directly interacting with websites. However, this design exposes them to indirect prompt injection attacks hidden in untrusted web content, allowing adversaries to hijack agent behavior and violate user intent. According to a new paper on arXiv, existing evaluations rely on fixed attack templates or narrowly scoped scenarios, limiting their ability to capture realistic attacks.

To address this gap, researchers introduced MUZZLE, an automated agentic framework for red-teaming web agents against indirect prompt injection. MUZZLE uses the agent's own execution trajectories to automatically identify high-salience injection surfaces and adaptively generate context-aware malicious instructions targeting violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on observed execution and iteratively refines attacks using feedback from failed executions.

How MUZZLE Works

MUZZLE operates with minimal human intervention. It analyzes the agent's step-by-step actions (trajectories) to pinpoint where an attacker could inject malicious content that the agent would process. It then generates adversarial objectives that aim to break security properties. The framework adapts its approach based on whether previous attacks succeeded or failed, making it more effective than fixed templates.

Evaluation Results

The researchers evaluated MUZZLE across diverse web applications, user tasks, and agent configurations. The results showed that MUZZLE effectively discovered 44 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties across different LLMs and agent scaffolds.

Metric Count
New attacks discovered 44
Web applications tested 4
Adversarial objectives 10
Cross-application injection attacks 3
Agent-tailored phishing scenarios 1

MUZZLE also identified novel attack strategies, including 3 cross-application prompt injection attacks and an agent-tailored phishing scenario. These findings demonstrate that current web agents are vulnerable to sophisticated, adaptive attacks that can leak sensitive data or perform unauthorized actions.

Implications for Enterprise Security

As CTOs and security leaders deploy LLM-based agents for automating business processes — from customer support to supply chain operations — the risk of indirect prompt injection becomes critical. MUZZLE provides a method for proactively testing agent security before deployment. The framework's ability to automate attack generation and adaptation means organizations can continuously assess their agents' resilience without relying on manual red-teaming.

The authors — Syros, Georgios; Rose, Evan; Grinstead, Brian; Kerschbaumer, Christoph; Robertson, William; Nita-Rotaru, Cristina; and Oprea, Alina — noted that MUZZLE effectively assesses agent security with minimal human intervention. While the paper focuses on general web agents, the same principles apply to specialized agents used in trade documentation, logistics portals, or customs systems, where prompt injection could compromise sensitive data or trigger unauthorized transactions.

Enterprise buyers should consider integrating automated red-teaming tools like MUZZLE into their AI security testing pipelines. The discovery of cross-application attacks highlights the need for isolating agent contexts and validating all external content before processing. As the paper shows, fixed defenses are insufficient against adaptive adversaries — security must evolve alongside agent behavior.


Sources:

Keep Reading

Recommended Stories

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot Technology

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot

A research paper by Dai and Dong introduces Knowledge Trap, a defense against large language model extraction attacks. It uses a Honeypot Knowledge Graph to redirect attackers' queries to low-value knowledge, reducing surrogate agreement by 6.2% on average while preserving legitimate user performance.

June 16, 2026
New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents Technology

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Researchers propose RETA, a training-based defense that grounds LLM agent security on user tasks rather than attack patterns. Using chain-of-thought reasoning and red-teaming with diversity reward, RETA keeps average attack success rate below 4% across six adaptive attacks while preserving utility.

June 16, 2026
New Survey Maps Agentic Security: Applications, Threats, and Defenses for Autonomous AI Technology

New Survey Maps Agentic Security: Applications, Threats, and Defenses for Autonomous AI

A new survey from arXiv provides the first holistic overview of agentic security, covering how LLM-based agents are used in cybersecurity, their vulnerabilities, and countermeasures. The analysis of over 260 papers reveals that agentic systems are structurally fragile and require defenses spanning the full agent lifecycle.

June 16, 2026
InstantForget: New Update-Free Backdoor Unlearning Method Uses Inference-Time Feature Reset for AI Security Technology

InstantForget: New Update-Free Backdoor Unlearning Method Uses Inference-Time Feature Reset for AI Security

A new research paper presents InstantForget, an update-free backdoor unlearning technique that operates at inference time without modifying model parameters. Using a Mahalanobis-based anomaly detector and feature reset, it reduces average attack success rate to 0.071 on CIFAR-10 with a detection AUROC of 0.981, though it fails on certain triggers and adaptive attacks.

June 16, 2026