adversarial attacks

4 stories

Cybersecurity #cybersecurity#ai security

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.

Jun 16, 2026 1 source

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot

Technology

Artificial Intelligence #large language models#llm

New Research Defends LLMs from Extraction Attacks Using 'Knowledge Trap' Honeypot

A research paper by Dai and Dong introduces Knowledge Trap, a defense against large language model extraction attacks. It uses a Honeypot Knowledge Graph to redirect attackers' queries to low-value knowledge, reducing surrogate agreement by 6.2% on average while preserving legitimate user performance.

Jun 16, 2026 1 source

AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents

Technology

Artificial Intelligence #llm#security

AutoDojo: Adaptive Attacks Expose Superficial Defenses and Structural Limits in LLM Agents

The AutoDojo framework adaptively optimizes indirect prompt injections against LLM agent defenses, revealing that many current defenses are superficial. Against a filter that reduces static attack success rate to 0%, AutoDojo recovers 28% overall and 64% on action-open tasks due to a structural limitation where injections can pose as ordinary data.

Jun 16, 2026 1 source

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Technology

Artificial Intelligence #prompt injection#ai security

New Defense Keeps Attack Success Rate Below 4% for Adaptive Prompt Injection on LLM Agents

Researchers propose RETA, a training-based defense that grounds LLM agent security on user tasks rather than attack patterns. Using chain-of-thought reasoning and red-teaming with diversity reward, RETA keeps average attack success rate below 4% across six adaptive attacks while preserving utility.

Jun 16, 2026 1 source