Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning

Researchers propose Mosaic, a novel data-free knowledge distillation framework that leverages Mixture-of-Experts (MoE) to overcome model and data heterogeneity in federated learning. Mosaic trains local generative models to synthesize data, forms an MoE from client models, and distills it into a global model. Experiments show consistent outperformance over state-of-the-art approaches on image and multimodal benchmarks.

iGEN Editorial

June 16, 2026

Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning

Enterprise AI teams deploying federated learning across heterogeneous hardware and data distributions face a persistent challenge: divergent model representations that degrade global performance. According to a preprint on arXiv, researchers have proposed Mosaic, a data-free knowledge distillation framework that uses a Mixture-of-Experts (MoE) architecture to address both model and data heterogeneity without accessing raw client data.

The Challenge of Heterogeneity in Federated Learning

Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy, the researchers explained. However, the coexistence of model heterogeneity (different architectures across clients) and data heterogeneity (non-IID data distributions) gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance.

Traditional knowledge distillation methods often require a labeled public dataset or access to real client data, which can violate privacy constraints. Data-free approaches attempt to generate synthetic data, but existing methods struggle when client models differ significantly.

How Mosaic Works: Data-Free Distillation with Mixture-of-Experts

Mosaic introduces a multi-step process to overcome these limitations. First, it trains local generative models on each client to approximate that client's personalized data distribution. These generative models enable synthetic data generation that safeguards privacy through strict separation from real data, according to the paper.

Next, Mosaic forms a Mixture-of-Experts (MoE) from the client models based on their specialized knowledge. The MoE architecture combines outputs from multiple 'expert' models, each potentially specialized in a subset of the data distribution. Mosaic then distills this ensemble into a single global model using the generated synthetic data.

To further enhance the MoE integration, Mosaic incorporates a lightweight meta model trained on a few representative prototypes. This meta model learns to weight expert predictions optimally, improving the distillation quality even when client models have very different architectures.

Experimental Results and Performance

The researchers conducted extensive experiments on standard image and multimodal benchmarks. They reported that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. While the preprint does not disclose specific numeric improvements, it states that the framework achieves superior performance across multiple test scenarios. The source code has been published online to enable replication and further research.

Component	Function
Local generative models	Approximate each client's data distribution; generate privacy-preserving synthetic data
Mixture-of-Experts (MoE)	Combine specialized knowledge from heterogeneous client models
Lightweight meta model	Learn optimal weighting of expert predictions using representative prototypes

Implications for Enterprise AI

For enterprise technology leaders, Mosaic addresses a critical bottleneck in scaling federated learning across diverse environments. In supply chain and logistics, where data privacy regulations and heterogeneous edge devices are common, such a framework could enable collaborative model training without centralizing sensitive shipment or customer data. However, the paper focuses on image and multimodal benchmarks; real-world validation in trade or logistics contexts remains pending.

The data-free nature of Mosaic reduces dependence on public datasets, which are often not representative of proprietary enterprise data. By handling both model and data heterogeneity, the framework could simplify deployment across a fleet of different devices—from IoT sensors in warehouses to cloud servers running custom models.

Enterprise buyers evaluating federated learning solutions should note that Mosaic is a research contribution. Its performance on non-vision tasks and at scale in production environments is not yet documented. Nevertheless, the architectural innovations—particularly the use of local generative models and meta-learned expert weighting—offer a promising direction for data-efficient, privacy-preserving distributed AI.

Sources:

Mosaic: Data-Free Knowledge Distillation Framework Uses Mixture-of-Experts to Tackle Heterogeneous Federated Learning

The Challenge of Heterogeneity in Federated Learning

How Mosaic Works: Data-Free Distillation with Mixture-of-Experts

Experimental Results and Performance

Implications for Enterprise AI

Recommended Stories

New Research Reveals Distinct Training Dynamics of On-Policy Distillation for Large Language Models

New Research Reveals How Visual Tokens Evolve Inside Vision-Language Models

Sequential DPO Study Reveals Non-Uniform Forgetting Across Multiple Preference Objectives

DiverseDistill: New Knowledge Distillation Method Recovers Over 70% of Performance Gap Using Teacher Committees