Federated learning—a technique for training machine learning models across decentralized data sources without centralizing raw data—has long been associated with exchanging model weights and gradients. However, modern deployments increasingly send different types of information, such as synthetic data or federated analytics, that fall outside traditional definitions. A new paper by researchers Guerrero, Alvaro Javier Vargas, Xinguang Wang, Quang Manh Doan, and Guy Nagels addresses this gap by proposing a formal mathematical definition of a federated message and a taxonomy that categorizes these exchanges, according to the paper posted on arXiv.
The paper, titled "Beyond Weights and Gradients: A Taxonomy of Federated Learning Messages," defines a federated message that accounts for both utility and privacy. The authors organize modern payloads into three categories: model structures, statistical summaries, and data-conditioned representations. This framework aims to provide a clearer understanding of the trade-offs involved in decentralized training, particularly regarding computational demands, communication costs, and privacy risks.
A Formal Definition for Federated Messages
Existing definitions of federated learning often focus narrowly on weight or gradient updates. The new paper introduces a mathematical formulation that captures the full scope of modern payloads, including synthetic data and federated analytics. By formalizing what constitutes a federated message, the researchers provide a foundation for comparing different communication strategies across federated systems.
The Three-Category Taxonomy
The taxonomy proposed by the authors groups federated messages into three distinct types:
| Category | Description | Key Considerations |
|---|---|---|
| Model structures | Exchanges involving model weights, gradients, or parts of the model architecture. | High computational demands; well-studied privacy risks. |
| Statistical summaries | Aggregated statistics such as means, variances, or histograms computed from local data. | Lower communication costs; moderate privacy leakage. |
| Data-conditioned representations | Synthetic data, embeddings, or other representations derived from local data distribution. | Potentially high utility; privacy risks depend on representation fidelity. |
According to the paper, evaluating these categories based on computational demands, communication costs, and privacy risks helps practitioners choose appropriate messaging strategies for their hardware and security requirements.
Evaluating Trade-offs
The paper explicitly evaluates the three groups along three dimensions: computational demands, communication costs, and privacy risks. For example, model structures (like weights) typically require significant computation to generate but can be efficient for communication when using compression. Statistical summaries are lighter computationally but may leak more information about the underlying data. Data-conditioned representations offer flexibility but introduce new privacy challenges. The formal definition of a federated message provides a mathematical lens to quantify these trade-offs, according to the authors.
A Field in Flux: The Shift Since 2021
To ground their taxonomy in real research trends, the authors reviewed 202 recent publications on federated learning. Their analysis reveals a significant shift since 2021 toward diverse messaging paradigms, moving away from standard deep learning updates toward more specialized information sharing. This trend underscores the need for a structured taxonomy—as more researchers and practitioners explore alternatives to plain weight or gradient exchange, a common vocabulary becomes essential for comparing approaches and advancing the field.
Implications for Federated System Design
For enterprise technology decision-makers evaluating federated learning platforms, the taxonomy offers a framework to assess which type of message exchange best suits their infrastructure. Organizations with limited bandwidth might favor statistical summaries; those with strict privacy requirements might prioritize data-conditioned representations with formal privacy guarantees. The paper's review of 202 publications also suggests that the ecosystem of federated communication strategies is expanding, meaning off-the-shelf solutions may need to support multiple message types to stay competitive. The research provides a structured path for optimizing federated systems for varying hardware and security requirements, as stated in the abstract.