Topic
mixture of experts
Artificial Intelligence #artificial intelligence#machine learning
SPRI: SVD-Partitioned Residual Initialization Boosts Data-Constrained MoE Upcycling for Multilingual Translation
Researchers propose SPRI, a method that initializes Mixture-of-Experts (MoE) models from pretrained dense models using SVD-partitioned residuals. Evaluated on multilingual speech-to-text translation, SPRI achieves gains of 2.58 BLEU and 3.32 COMET over fine-tuned dense models, and outperforms prior MoE upcycling baselines by 3.39 BLEU and 4.34 COMET points.
Jun 16, 2026 1 source
Artificial Intelligence #tied expert layers#mixture-of-experts
Expert Tying Reduces Memory Footprint of Mixture-of-Experts LLMs by Nearly Half
A new arXiv paper from Jaggi proposes Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers. Pretraining experiments show memory footprint reduction by almost 2x with virtually no degradation in perplexity or downstream quality, evaluated on OLMoE, Qwen3, and DeepSeek-style architectures.
Jun 16, 2026 1 source