iGEN
Visit IGEN World Explore IGEN Expo
EXPLORE UPGRADE PLANS
BREAKING
UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds N-Sea appoints Pim Nelemans as chief executive, succeeding Martin Adler ‘We’re not flipping a switch and pushing it to everyone at once’: Sonos is about to make its biggest changes yet to the controversial new app, designed to make it way more intuitive to use — and it seems to have learned from its past mistakes New Generalization Bounds for Deep Learning Models via Local Robustness and Stability Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications LectūraAgents Multi-Agent Framework Promises Adaptive Personalized AI-Assisted Learning Amazfit Cheetah 2 Ultra: The Most Expensive Smartwatch Yet—Is It Worth the Price? UniBrain: A Unified Multimodal Model for Brain MRI Imputation and Understanding DeepRoot Multi-Agent System Enables Therapeutic Reasoning Over Historical Medical Texts with 47.6% Accuracy Primacy Bias in Multimodal RAG: First Retrieved Items Dominate, Study Finds N-Sea appoints Pim Nelemans as chief executive, succeeding Martin Adler ‘We’re not flipping a switch and pushing it to everyone at once’: Sonos is about to make its biggest changes yet to the controversial new app, designed to make it way more intuitive to use — and it seems to have learned from its past mistakes New Generalization Bounds for Deep Learning Models via Local Robustness and Stability Deep Residual Injection Method Enables Full-Spectrum Forensic AI Detection in Multimodal Models JoyAI-VL-Interaction Model Brings Real-Time Vision-Language AI to Enterprise Applications LectūraAgents Multi-Agent Framework Promises Adaptive Personalized AI-Assisted Learning Amazfit Cheetah 2 Ultra: The Most Expensive Smartwatch Yet—Is It Worth the Price?
Home ›› Technology ›› Ai ›› Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

Researchers propose DiSan, a privacy-preserving text sanitization framework that uses disentangled representations to separate task semantics from style identifiers. Experiments show it reduces personally identifiable information exposure by 20 times while maintaining 83% answer faithfulness on a multi-agent RAG benchmark, outperforming token-level masking.

iG
iGEN Editorial
June 16, 2026
Privacy-Preserving Text Sanitization for Distributed Agents via Disentangled Representations

When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns, according to a paper published on arXiv. The researchers introduce DiSan (Disentangled Sanitization), a framework designed to protect such exchanges by factorizing text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local.

The Limits of Token-Level Masking

Traditional approaches often rely on masking personally identifiable information (PII) tokens. However, the paper demonstrates that this method is insufficient. Specifically, masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. This suggests that stylistic fingerprints persist even after heavy masking.

Method Token Masking Rate Stylometric Attribution Reduction PII Exposure Reduction Answer Faithfulness
Token-level masking 19.2% 18.6%
DiSan (answer-level) 73.2% (TF-IDF), 70.6% (neural probe) 20× reduction 83%

Disentangled Sanitization via DiSan

DiSan uses a two-stream encoder to separate content from style. One stream captures the role-specific semantics needed for the task, while the other encodes stylistic patterns that could identify the source. The framework employs federated prototype alignment and adversarial regularization to enable joint training without centralizing raw text. This allows multiple distributed agents to collaborate while keeping their stylistic patterns local.

DiSan is described as a built-in component of Intern-Shannon, a broader system for multi-agent collaboration. The paper does not provide further details on Intern-Shannon's architecture.

Measuring Effectiveness

The researchers evaluated DiSan on two benchmarks:

  • Distributed multi-agent RAG benchmark: DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness.
  • Enron email dataset: DiSan lowers stylometric attribution by 73.2% under TF-IDF analysis and 70.6% under a neural probe attack.

These results indicate that disentangled representations provide a stronger privacy guarantee than identifier masking alone, as they address both explicit identifiers and latent stylistic signatures.

Implications for Enterprise Data Sharing

For organizations that rely on distributed agents—such as supply chain partners exchanging logistics data or financial institutions collaborating on trade finance—the threat of information leakage extends beyond explicit names and numbers. Stylistic patterns can inadvertently reveal organizational identity or even individual authorship. DiSan's approach offers a formal framework to strip away these signals while preserving the semantic content needed for collaborative tasks. The technique is agnostic to the underlying task, making it potentially applicable to any text-based multi-agent system where privacy is a concern. Future work may explore integration with existing privacy-preserving technologies like federated learning and differential privacy.


Sources:

Keep Reading

Recommended Stories

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models Technology

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation, but combining their knowledge is an underexplored problem. Researchers introduce TIE (Trajectory-based Iterative Ensembling), a framework that tracks confidence dynamics over answer-relevant positions to relay decoding trajectories between models, achieving strong performance on diverse reasoning tasks.

June 16, 2026
PreLort: Prefix-Nested LoRA Enables Federated Fine-Tuning Across Heterogeneous Hardware Ranks Technology

PreLort: Prefix-Nested LoRA Enables Federated Fine-Tuning Across Heterogeneous Hardware Ranks

A new method called PreLort addresses the challenge of aggregating federated LoRA adapters with different ranks due to heterogeneous hardware. By organizing adapter dimensions into a prefix hierarchy and introducing segment-wise aggregation and prefix-nested training, PreLort consistently outperforms existing heterogeneous federated LoRA methods in accuracy and ROUGE-L while achieving lower perplexity.

June 16, 2026
VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper Technology

VibeThinker-3B: Small Language Model Matches Giants in Verifiable Reasoning, According to arXiv Paper

A new technical report on arXiv introduces VibeThinker-3B, a compact 3B-parameter language model that achieves verifiable reasoning scores comparable to models orders of magnitude larger, including DeepSeek V3.2, GLM-5, and Gemini 3 Pro. The model uses a Spectrum-to-Signal post-training paradigm and achieves 94.3 on AIME26 and 80.2% Pass@1 on LiveCodeBench v6.

June 16, 2026
Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming Technology

Vernier Research Reveals Why Language Models Give Inconsistent Answers to Causal Questions After Variable Renaming

Researchers introduce Vernier, a probing technique that reveals representational misalignment in instruction-tuned language models when variable names are replaced with placeholders, causing inconsistent answers to causal reasoning questions. The study tests models including Qwen-7B, Qwen-14B, and Llama-3.1-8B, and finds that success is bounded by model family, scale, and task.

June 16, 2026