When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns, according to a paper published on arXiv. The researchers introduce DiSan (Disentangled Sanitization), a framework designed to protect such exchanges by factorizing text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local.
The Limits of Token-Level Masking
Traditional approaches often rely on masking personally identifiable information (PII) tokens. However, the paper demonstrates that this method is insufficient. Specifically, masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. This suggests that stylistic fingerprints persist even after heavy masking.
| Method | Token Masking Rate | Stylometric Attribution Reduction | PII Exposure Reduction | Answer Faithfulness |
|---|---|---|---|---|
| Token-level masking | 19.2% | 18.6% | — | — |
| DiSan (answer-level) | — | 73.2% (TF-IDF), 70.6% (neural probe) | 20× reduction | 83% |
Disentangled Sanitization via DiSan
DiSan uses a two-stream encoder to separate content from style. One stream captures the role-specific semantics needed for the task, while the other encodes stylistic patterns that could identify the source. The framework employs federated prototype alignment and adversarial regularization to enable joint training without centralizing raw text. This allows multiple distributed agents to collaborate while keeping their stylistic patterns local.
DiSan is described as a built-in component of Intern-Shannon, a broader system for multi-agent collaboration. The paper does not provide further details on Intern-Shannon's architecture.
Measuring Effectiveness
The researchers evaluated DiSan on two benchmarks:
- Distributed multi-agent RAG benchmark: DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness.
- Enron email dataset: DiSan lowers stylometric attribution by 73.2% under TF-IDF analysis and 70.6% under a neural probe attack.
These results indicate that disentangled representations provide a stronger privacy guarantee than identifier masking alone, as they address both explicit identifiers and latent stylistic signatures.
Implications for Enterprise Data Sharing
For organizations that rely on distributed agents—such as supply chain partners exchanging logistics data or financial institutions collaborating on trade finance—the threat of information leakage extends beyond explicit names and numbers. Stylistic patterns can inadvertently reveal organizational identity or even individual authorship. DiSan's approach offers a formal framework to strip away these signals while preserving the semantic content needed for collaborative tasks. The technique is agnostic to the underlying task, making it potentially applicable to any text-based multi-agent system where privacy is a concern. Future work may explore integration with existing privacy-preserving technologies like federated learning and differential privacy.