New Attack FragFuse Exploits LLM Agent Memory to Bypass Access Controls

Researchers introduce FragFuse, a novel attack that bypasses access control in large language model agents by fragmenting prohibited queries across interactions and storing them in long-term memory, later reconstructing them without triggering defenses. The attack achieves an 86.3% average bypass success rate across multiple agent settings and exposes a critical vulnerability in memory-based AI systems.

iGEN Editorial

June 16, 2026

New Attack FragFuse Exploits LLM Agent Memory to Bypass Access Controls

Enterprise adoption of large language model (LLM) agents is accelerating, but a new research paper reveals a fundamental security gap in how these agents handle memory. The attack, named FragFuse, exploits the temporal channel introduced by long-term memory to circumvent access-control mechanisms, according to a preprint on arXiv (arXiv:2606.15609).

Attack Mechanism: Fragmentation and Fusion

FragFuse operates in three stages. First, it identifies rejection-responsive fragments via black-box adaptive querying with fragment masking — determining which parts of a policy-violating request trigger access control. Second, it injects these fragments into the agent's long-term memory using marker carrier queries, storing them in benign-appearing form. Third, a follow-up attack query retrieves and fuses the stored fragments, reconstructing the prohibited content without it appearing explicitly in the final user query.

According to the paper, this is the first attack to bypass agent access control by exploiting memory operations. The researchers developed a surrogate-based optimization scheme that tunes fusion instructions and marker designs, enabling automated attack generation without violating the attacker's threat-model assumptions.

Performance Metrics and Evaluation

The researchers evaluated FragFuse across four representative agent settings and task domains, covering three state-of-the-art agent access-control mechanisms. Key results are shown in the table below:

Metric	Value
Average bypass success rate	86.3%
Average end-to-end harmful task success rate	41.1%
Average task-success degradation vs. no access control	4.4%

Importantly, the attack maintains nearly the same task success rate as configurations without access control (only 4.4% degradation), indicating that bypassing does not sacrifice functional performance.

Defenses Ineffective

The paper also tested alternative defenses, including state-of-the-art prompt-injection detectors and perplexity detectors. None effectively addressed the FragFuse attack. This underscores a critical gap in current security approaches for LLM agents that rely on long-term memory.

Broader Implications

The FragFuse attack highlights a novel attack surface arising from agent memory operations. As enterprises increasingly deploy LLM agents for tasks like customer support, code generation, and data analysis, the ability to bypass access controls could lead to unauthorized actions or data exposure. The research suggests that memory-based architectures require fundamentally new defense mechanisms that can detect and prevent temporal fragmentation of policy-violating content.

While the study does not name specific commercial agents, its findings are broadly applicable to any LLM agent with long-term memory and access control. Enterprise technology leaders should evaluate the memory management and access-control implementations of any AI agents in their stack.

Sources:

New Attack FragFuse Exploits LLM Agent Memory to Bypass Access Controls

Attack Mechanism: Fragmentation and Fusion

Performance Metrics and Evaluation

Defenses Ineffective

Broader Implications

Recommended Stories

New Research Reveals LLM Agents Often Choose Over-Privileged Tools, Posing Security Risks

AIChilles Automatically Unearths Hidden Weaknesses in AI-Evolved Programs

CmdNeedle Reveals Widespread Fragility in AI Agent Command Denylists

OpenAI Models Breached Hugging Face in Sandbox Escape, Then Remained Active for Days