iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models Spacex Acquires AI Coding Startup Cursor For $60bn Days After Bumper IPO Metacognitive Myopia in LLMs: New Framework Reveals Hidden Biases with High-Stakes Implications Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers APEC Climate Center Upgrades El Niño to Strong; Indian Monsoon Faces Elevated Risk New Architecture GRIL Enables Gradient Descent-Like Learning in Linear Recurrent Networks ToolSelf AI Agents Achieve 28.8 Point Gain Through Runtime Self-Reconfiguration ArtNet: JEPA-Like Articulatory Framework Achieves 20.56% Error Reduction in Zero-Shot Phoneme Recognition LLM-Assisted Stance Detection in Scientific Discourse Reaches 0.76 Combined Reliability Score New Drift-RAE Method Distills Transformers Efficiently Using Representation Autoencoders Cough Regression Benchmark Reveals Trade-Offs in Respiratory Acoustic Foundation Models Spacex Acquires AI Coding Startup Cursor For $60bn Days After Bumper IPO Metacognitive Myopia in LLMs: New Framework Reveals Hidden Biases with High-Stakes Implications Lightweight Hardware-Aware Neural Architecture Search Enables CNNs on Ultra-Low-Power Microcontrollers
Home ›› Technology ›› Ai ›› Llms ›› New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Researchers propose RETA, a training-based defense that grounds LLM agent security on user tasks rather than attack patterns. Using chain-of-thought reasoning and red-teaming with diversity reward, RETA keeps average attack success rate below 4% across six adaptive attacks while preserving utility.

iG
iGEN Editorial
June 16, 2026
New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Indirect prompt injection attacks pose a growing threat to enterprises deploying LLM-based agents in production workflows. These attacks hijack agents by embedding malicious instructions in third-party data retrieved during task execution — a common scenario in AI-powered supply chain systems, customer service bots, and document processing pipelines. Existing defenses report near-zero attack success rate on static benchmarks, but according to a new paper published on arXiv, these results collapse once the attacker is allowed to optimize against the deployed defense.

The researchers — He Lipeng, Wang Yihan, Zhang Jiawen, and N Asokan — identify two failure modes behind this collapse. First, current defense methods are confined to recognizing specific attack patterns, rather than assessing whether the intent of every embedded instruction is relevant to the user task. Second, training-based defenses, which otherwise offer the strongest safety-utility trade-off, assemble their adversarial examples from a handful of hand-crafted templates, causing the defender to fail when faced with novel attack strategies.

RETA: A Task-Centric Defense

To address these gaps, the team proposes RETA (Reasoning-enabled Task Alignment). RETA is a training-based method that grounds defense decisions on the user tasks rather than attacker-controlled data. At each tool-output step, the defender undertakes chain-of-thought reasoning to verify that its actions are consistent with the user task. This shifts the focus from recognizing attack patterns to assessing alignment with legitimate business objectives.

The system also leverages red-teaming: a simulated attacker synthesizes adversarial training data and receives a dictionary-learning diversity reward, achieving broad coverage of injection reformulation strategies. This prevents the narrow strategy distribution problem. The defender is optimized via multi-objective reinforcement learning, achieving a better safety-utility trade-off.

Quantified Results

RETA was evaluated across six black-box adaptive attacks. The results are summarized below:

Attack Scenario Attack Success Rate (Model A) Attack Success Rate (Model B)
Attack 1 <10% <10%
Attack 2 <10% <10%
Attack 3 <10% <10%
Attack 4 <10% <10%
Attack 5 <10% <10%
Attack 6 <10% <10%
Average 2.92% 3.75%

According to the paper, RETA keeps every per-attack ASR below 10%, with average success rates of 2.92% on the first target model and 3.75% on the second. Importantly, the system preserves most utility under attack and on clean inputs.

"Across six black-box adaptive attacks, RETA keeps every per-attack ASR below 10%, with average ASR of 2.92% and 3.75% on the two target models, while preserving most utility under attack and on clean inputs."

Implications for Enterprise AI Deployments

For CTOs and technology leaders deploying LLM agents in supply chain or logistics contexts, this research highlights a critical security evolution. Existing defenses that rely on pattern recognition are brittle against adaptive adversaries. RETA's task-alignment approach offers a more robust foundation, particularly for systems that retrieve and act on third-party data — such as supplier documents, shipping manifests, or trade compliance databases. The ability to maintain low attack success rates while preserving utility means enterprises can confidently integrate AI agents without compromising operational integrity or security.


Sources:

Keep Reading

Recommended Stories

SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation Technology

SPARK Method Activates Latent Security Knowledge in LLMs for Secure Code Generation

SPARK (Security Knowledge Priming and Representation-Guided Knowledge Activation) is a new inference-time method that improves the security of code generated by large language models without requiring retraining. The researchers argue that pretraining data already contains sufficient security material; the bottleneck is activation. Evaluated on 9 open-source and 7 proprietary models, SPARK matches or improves secure code generation baselines while preserving code utility.

June 16, 2026
New Automated Jailbreak Attack UNIATTACK Achieves High Success Rate Against Multi-Layered LLM Defenses Technology

New Automated Jailbreak Attack UNIATTACK Achieves High Success Rate Against Multi-Layered LLM Defenses

Researchers present UNIATTACK, an adversarial testing framework that extracts high-impact attack features from existing exploits and uses a specialized attacker LLM to compose flexible templates. The framework achieves an average attack success rate improvement of 64.63% to 248.82% over baselines on models with multi-layered defenses, while costing only 0.03% to 4.96% of baseline costs.

June 16, 2026
Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy Technology

Latent Thought Flow: Efficient Reasoning in LLMs Cuts Cost and Boosts Accuracy

Researchers propose Latent Thought Flow (LTF), a method that models LLM reasoning as continuous trajectories in latent space, using GFlowNet and entropy-weighted objectives. LTF outperforms explicit Chain-of-Thought and latent reasoning baselines, achieving 9.5% higher accuracy while cutting reasoning length by 27.2%, addressing the linguistic bottleneck that inflates inference costs.

June 16, 2026
New Survey Maps Agentic Security: Applications, Threats, and Defenses for Autonomous AI Technology

New Survey Maps Agentic Security: Applications, Threats, and Defenses for Autonomous AI

A new survey from arXiv provides the first holistic overview of agentic security, covering how LLM-based agents are used in cybersecurity, their vulnerabilities, and countermeasures. The analysis of over 260 papers reveals that agentic systems are structurally fragile and require defenses spanning the full agent lifecycle.

June 16, 2026