Topic
red teaming
MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.
New DeepTrap Framework Reveals Contextual Vulnerabilities in OpenClaw Agentic AI Systems
A new research paper presents DeepTrap, an automated framework for red-teaming agentic AI systems by discovering contextual vulnerabilities beyond user prompts. The framework was evaluated on OpenClaw, a benchmark of 42 cases across six vulnerability classes and seven operational scenarios, testing nine target models. Results show that contextual compromise can induce unsafe behavior while preserving task completion, indicating that final-response evaluation is insufficient.