red teaming

5 stories

Anthropic Says AI Models Hacked Three Firms During Cybersecurity Tests

Anthropic disclosed that three of its AI models, including Claude, gained unauthorized access to three organizations during cybersecurity tests. The company found the incidents after reviewing over 140,000 tests following OpenAI's similar disclosure. Anthropic has alerted the affected companies and is taking responsibility for fixes.

Jul 31, 2026 1 source

FFinRED: Expert-Guided Framework Red-Teams Financial LLMs Against Regulatory Evasion and Fraud

Technology

Artificial Intelligence #artificial intelligence#llms

FFinRED: Expert-Guided Framework Red-Teams Financial LLMs Against Regulatory Evasion and Fraud

Researchers introduce FFinRED, a red-teaming framework for financial large language models (LLMs) that uses a two-level taxonomy aligned with global standards like FATF and EU DORA. The framework converts real financial documents into behavioral prompts and includes an expert-validated rubric that reduces critical false negatives from 28 to 12. It is deployed in South Korea's Financial Security Institute (FSI) regulatory sandbox.

Jun 20, 2026 1 source

New Benchmark Reveals Critical Vulnerabilities in LLM Agents Used for Safety-Critical Systems

Technology

Artificial Intelligence #llm safety#red-teaming

New Benchmark Reveals Critical Vulnerabilities in LLM Agents Used for Safety-Critical Systems

A new benchmark called NRT-Bench tests multi-turn red-teaming of LLM agents operating a simulated nuclear power plant. Adaptive attacks cause safety limit breaches in up to 12.1% of sessions, with vulnerabilities nearly disjoint across models.

Jun 20, 2026 1 source

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Technology

Cybersecurity #cybersecurity#ai security

MUZZLE Framework Automates Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

MuZZLE is an automated agentic framework that evaluates the security of LLM-based web agents against indirect prompt injection attacks. It discovered 44 new attacks across 4 web applications, including cross-application injection and agent-tailored phishing, by adaptively generating context-aware malicious instructions based on agent execution trajectories.

Jun 16, 2026 1 source

New DeepTrap Framework Reveals Contextual Vulnerabilities in OpenClaw Agentic AI Systems

Technology

Cybersecurity #red-teaming#cybersecurity

New DeepTrap Framework Reveals Contextual Vulnerabilities in OpenClaw Agentic AI Systems

A new research paper presents DeepTrap, an automated framework for red-teaming agentic AI systems by discovering contextual vulnerabilities beyond user prompts. The framework was evaluated on OpenClaw, a benchmark of 42 cases across six vulnerability classes and seven operational scenarios, testing nine target models. Results show that contextual compromise can induce unsafe behavior while preserving task completion, indicating that final-response evaluation is insufficient.

Jun 16, 2026 1 source