Federated fine-tuning of large language models (LLMs) using parameter-efficient methods like LoRA (Low-Rank Adaptation) enables privacy-preserving adaptation of foundation models. However, heterogeneous hardware resources introduce a critical challenge: clients with different adapter ranks cannot be directly aggregated. Existing methods that allow aggregation under heterogeneous ranks fail to control how information is distributed across rank dimensions, leading to suboptimal use of shared low-rank representations. To solve this, researchers from multiple institutions have proposed PreLort, a nested low-rank formulation for federated LoRA that organizes adapter dimensions into a prefix hierarchy.
The Challenge of Heterogeneous Ranks
In federated learning, clients often possess different computational capabilities, resulting in varying adapter ranks when fine-tuning LLMs with LoRA. Direct averaging of these heterogeneous adapters dilutes the information contributed by lower-rank clients, as zero-padding disrupts the alignment of rank dimensions. According to the paper, existing heterogeneous federated LoRA methods do not control how information is distributed across rank dimensions, causing suboptimal use of shared low-rank representations. PreLort addresses this by ensuring that lower-rank dimensions encode task-relevant information while higher-rank dimensions capture additional capacity.
How PreLort Works
PreLort introduces three key components that together encourage a consistent low-rank prefix capturing the most task-relevant information, while higher-rank dimensions learn additional capacity. The first is a segment-wise aggregation rule that averages only over clients contributing to each rank segment, avoiding dilution from zero-padded lower-rank clients. The second is a prefix-nested training strategy that optimizes each adapter under multiple rank truncations, encouraging useful signal to concentrate in low-rank prefix dimensions. The third is the overall nested low-rank formulation that organizes adapter dimensions into a prefix hierarchy. These components allow low-rank clients to benefit from richer information contributed by higher-rank clients, as prefix dimensions are consistently learned and aggregated.
| Component | Description | Benefit |
|---|---|---|
| Segment-wise aggregation | Averages only over clients contributing to each rank segment | Avoids dilution from zero-padded lower-rank clients |
| Prefix-nested training | Optimizes each adapter under multiple rank truncations | Encourages useful signal to concentrate in low-rank prefix dimensions |
| Nested low-rank formulation | Organizes adapter dimensions into a prefix hierarchy | Ensures lower-rank dimensions encode task-relevant information, higher-rank capture additional capacity |
Experimental Results
Experiments conducted by the researchers demonstrate that PreLort consistently outperforms prior heterogeneous federated LoRA methods in accuracy and ROUGE-L, a metric for evaluating text generation quality. Additionally, the method achieves lower or comparable perplexity across multiple base models. The paper states that "our method consistently outperforms prior heterogeneous federated LoRA methods in accuracy and ROUGE-L, while achieving lower or comparable perplexity across multiple base models."
Implications for Enterprise AI
For enterprise technology decision-makers, PreLort represents a step toward more efficient and effective federated learning deployments. In scenarios where edge devices or regional servers have varying hardware capabilities—common in global supply chains and logistics—the ability to aggregate adapters without information loss can improve model performance without centralizing sensitive data. While the research is still in the academic phase, the method's focus on handling rank heterogeneity directly addresses a practical barrier to deploying federated LLM fine-tuning in heterogeneous environments.
The authors of the paper are Waseem, Muhammad, Tastan, Nurbek, Jovanovic, Andrej, Lane, Nicholas D, Lukas, Nils, Nandakumar, Karthik, and Horvath, Samuel. The work is available on arXiv and has been submitted to the computer science subcategory of Distributed, Parallel, and Cluster Computing.