A new research paper presents DeepTrap, an automated framework designed to discover contextual vulnerabilities in agentic language-model systems, specifically targeting the OpenClaw benchmark. The work addresses a critical security gap: these systems increasingly rely on mutable execution contexts—including files, memory, tools, skills, and auxiliary artifacts—creating risks that extend beyond explicit user prompts. According to the paper, DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth.
The DeepTrap Framework
DeepTrap combines several advanced techniques to identify high-value compromised contexts. The framework employs:
- Risk-conditioned evaluation to assess how context manipulations affect system behavior.
- Multi-objective trajectory scoring to weigh multiple attack goals simultaneously.
- Reward-guided beam search to efficiently explore the space of possible context modifications.
- Reflection-based deep probing to iteratively refine attacks based on system responses.
According to the researchers, this approach enables the discovery of context vulnerabilities that would be missed by traditional security testing, which often focuses only on final responses.
Benchmark and Findings
The team constructed a 42-case benchmark spanning six vulnerability classes and seven operational scenarios. They evaluated nine target models using both attack and utility grading scores. The results are striking: contextual compromise could induce substantial unsafe behavior while still preserving user-facing task completion. This demonstrates, according to the authors, that "final-response evaluation is insufficient" for securing agentic AI systems. The findings underscore the need for execution-centric security evaluation that monitors the entire trajectory of system actions, not just the output.
Implications for Enterprise AI Security
For CTOs and technology leaders deploying agentic AI in their operations, the paper highlights a new class of risk. Traditional security testing that only validates final outputs may miss subtle context manipulations that lead to unsafe actions. The OpenClaw benchmark provides a standardized way to evaluate such vulnerabilities. The authors have released their code publicly, enabling organizations to test their own systems against similar attacks.
While the paper focuses on OpenClaw, the underlying principles apply broadly to any agentic system that maintains state across interactions—from customer-service chatbots to autonomous supply-chain coordinators. Enterprises should consider adopting execution-centric security evaluations as part of their AI governance frameworks.
The findings highlight the need for execution-centric security evaluation of agentic AI systems.
Given the increasing adoption of AI agents in critical business processes, this research serves as a timely reminder that security must encompass not just what the system says, but how it acts throughout a session. The DeepTrap framework offers a concrete tool for red-teaming these systems, helping organizations identify and mitigate risks before deployment.