AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

A new benchmark called AgentLeak evaluates privacy leakage in multi-agent large language model (LLM) systems, finding that inter-agent messages leak at 68.8% compared to 27.2% for final outputs. Across 1,000 scenarios and five models, total system exposure reaches 68.9%, highlighting risks invisible to standard output-only audits.

iGEN Editorial

June 16, 2026

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

Enterprise adoption of multi-agent LLM systems introduces privacy risks that conventional output-only assessments cannot detect, according to a new benchmark from academic researchers. The benchmark, called AgentLeak, instruments seven privacy-relevant communication pathways and provides a large-scale empirical evaluation focused on final outputs, inter-agent messages, and shared memory. The findings have direct implications for any organization deploying multi-agent AI in sensitive domains such as healthcare, finance, supply chain management, and logistics.

Across 1,000 scenarios spanning healthcare, finance, legal, and corporate domains, the researchers tested five production LLMs: GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Mistral Large, and Llama 3.3 70B. They collected 4,979 validated execution traces to measure leakage rates.

Key Findings: Internal Channels Are the Weak Point

Multi-agent configurations reduce leakage in final outputs (C1: 27.2%) compared to single-agent mode (43.2%), but they introduce internal channels that dramatically increase total system exposure. Inter-agent messages (C2) leak at 68.8% — meaning that output-only audits miss 41.7% of violations. The aggregated leakage across final outputs, inter-agent messages, and shared memory (C1, C2, C5) reaches 68.9%.

Channel	Leakage Rate
Single-agent final output (baseline)	43.2%
Multi-agent final output (C1)	27.2%
Inter-agent messages (C2)	68.8%
Aggregated multi-agent (C1+C2+C5)	68.9%

The pattern C2 ≥ C1 held consistently across all five models and all four domains.

Why This Matters for Enterprise AI

For enterprise technology leaders deploying multi-agent LLM systems — for example, in automated supply chain coordination, trade finance document processing, or logistics optimization — the research highlights that architectural coordination channels can become the primary vector for data leakage. As the researchers note, "privacy risk in multi-agent systems is strongly shaped by architectural coordination channels rather than final-output behavior alone."

"Inter-agent messages (C2) leak at 68.8%, compared with 27.2% for final outputs (C1), meaning that output-only audits miss 41.7% of violations."

This suggests that standard security practices — such as scanning only the final LLM output for sensitive data — are insufficient. Enterprises must inspect and sanitize inter-agent communication paths, shared memory, and tool arguments to mitigate total exposure.

Research Methodology and Scope

The study, authored by Yagoubi, Faouzi El, Godwin Badu-Marfo, and Ranwa Al Mallah, was released on arXiv (paper ID 2602.11510). It defines AgentLeak as a benchmark that instruments seven privacy-relevant pathways, though the current evaluation focuses on final outputs (C1), inter-agent messages (C2), and shared memory (C5). The coordinator-worker multi-agent architecture was used.

Implications for Supply Chain and Logistics

While the benchmark domains do not explicitly include supply chain or trade, the findings are directly transferable. Multi-agent systems are increasingly used in logistics for tasks like real-time route optimization, customs document handling, and supplier coordination. In these contexts, inter-agent messages often contain proprietary pricing data, contract terms, customer identities, or trade secrets. A leakage rate of 68.8% across inter-agent channels could expose such sensitive information to unintended parties, including competitors or malicious actors.

Technology procurement leaders evaluating multi-agent AI platforms should demand from vendors:

Audit trails of all inter-agent messages
Redaction or encryption of internal communication channels
Benchmarking against tools like AgentLeak before deployment

Competitive and Industry Context

The five LLMs evaluated represent the leading frontier models from OpenAI, Anthropic, Mistral AI, and Meta. No single model performed uniformly better on internal-channel leakage, indicating that the architectural design of multi-agent systems — not just the model choice — drives privacy outcomes.

The AgentLeak benchmark itself is, as of publication, a research artifact rather than a commercial product. However, it provides a methodology that could be adopted or adapted by enterprise security teams or third-party auditors. Startups building multi-agent orchestration platforms (e.g., CrewAI, AutoGen) may face increased scrutiny over internal data handling.

The Bottom Line

For any organization deploying multi-agent LLMs in production, the AgentLeak findings underscore a critical blind spot: standard output-level defenses cannot see the most significant leak path. Enterprises should immediately begin auditing their multi-agent architectures for internal-channel leakage, using benchmarks like AgentLeak as a reference. The technology stack — whether built on top of GPT-4o, Claude, or Llama — must incorporate privacy controls at the agent communication layer.

Sources:

AgentLeak Benchmark Reveals Internal Channel Privacy Leaks in Multi-Agent LLM Systems

Key Findings: Internal Channels Are the Weak Point

Why This Matters for Enterprise AI

Research Methodology and Scope

Implications for Supply Chain and Logistics

Competitive and Industry Context

The Bottom Line

Recommended Stories

Reinforcement-Aware Knowledge Distillation Boosts LLM Reasoning Efficiency

MEAL Benchmark Enables Continuous Multi-Agent RL Training on 100 Tasks in Hours Using GPU Acceleration

Neuro-Inspired Vision-Language Models Show Resilience to Membership Inference Privacy Leakage

Haiku to Opus in Just 10 bits: LLMs Unlock Large Compression Gains