Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

Researchers propose DiSan, a privacy-preserving text sanitization framework that uses disentangled representations to separate task semantics from style identifiers. Experiments show it reduces personally identifiable information exposure by 20 times while maintaining 83% answer faithfulness on a multi-agent RAG benchmark, outperforming token-level masking.

iGEN Editorial

June 16, 2026

Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns, according to a paper published on arXiv. The researchers introduce DiSan (Disentangled Sanitization), a framework designed to protect such exchanges by factorizing text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local.

The Limits of Token-Level Masking

Traditional approaches often rely on masking personally identifiable information (PII) tokens. However, the paper demonstrates that this method is insufficient. Specifically, masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. This suggests that stylistic fingerprints persist even after heavy masking.

Method	Token Masking Rate	Stylometric Attribution Reduction	PII Exposure Reduction	Answer Faithfulness
Token-level masking	19.2%	18.6%	—	—
DiSan (answer-level)	—	73.2% (TF-IDF), 70.6% (neural probe)	20× reduction	83%

Disentangled Sanitization via DiSan

DiSan uses a two-stream encoder to separate content from style. One stream captures the role-specific semantics needed for the task, while the other encodes stylistic patterns that could identify the source. The framework employs federated prototype alignment and adversarial regularization to enable joint training without centralizing raw text. This allows multiple distributed agents to collaborate while keeping their stylistic patterns local.

DiSan is described as a built-in component of Intern-Shannon, a broader system for multi-agent collaboration. The paper does not provide further details on Intern-Shannon's architecture.

Measuring Effectiveness

The researchers evaluated DiSan on two benchmarks:

Distributed multi-agent RAG benchmark: DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness.
Enron email dataset: DiSan lowers stylometric attribution by 73.2% under TF-IDF analysis and 70.6% under a neural probe attack.

These results indicate that disentangled representations provide a stronger privacy guarantee than identifier masking alone, as they address both explicit identifiers and latent stylistic signatures.

Implications for Enterprise Data Sharing

For organizations that rely on distributed agents—such as supply chain partners exchanging logistics data or financial institutions collaborating on trade finance—the threat of information leakage extends beyond explicit names and numbers. Stylistic patterns can inadvertently reveal organizational identity or even individual authorship. DiSan's approach offers a formal framework to strip away these signals while preserving the semantic content needed for collaborative tasks. The technique is agnostic to the underlying task, making it potentially applicable to any text-based multi-agent system where privacy is a concern. Future work may explore integration with existing privacy-preserving technologies like federated learning and differential privacy.

Sources:

Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

The Limits of Token-Level Masking

Disentangled Sanitization via DiSan

Measuring Effectiveness

Implications for Enterprise Data Sharing

Recommended Stories

SDFLoRA: Selective Decoupled Federated LoRA for Privacy-Preserving Fine-Tuning with Heterogeneous Clients

Large Language Models Can Read Compressed Text That Humans Cannot, Researchers Find

Fast LLM-Based Semantic Filtering: Unified Framework and Adaptive Two-Phase Method Deliver 1.6–2.0x Speed Gains

AL-GNN: New Privacy-Preserving Continual Graph Learning Eliminates Replay Buffers and Backpropagation