Modern machine learning increasingly manipulates probability measures — from empirical datasets and generated samples to latent distributions and attention patterns. Comparing these objects in a statistically meaningful way is a core challenge. A new book, 'Optimal Transport for Machine Learners' by Peyré and Gabriel, published on arXiv, presents optimal transport (OT) as a unified language for losses, generative modeling, domain adaptation, robust learning, barycenters, gradient flows, and mean-field descriptions of learning algorithms.
According to the abstract, the book is written with machine-learning uses in mind. It starts from finite assignment and the Monge map viewpoint, then moves to Kantorovich couplings and dual potentials. The authors systematically explain the algorithmic ideas that make transport usable: linear programming, semi-discrete cells, Sinkhorn scaling, and low-dimensional projections.
Key Techniques Covered
The same objects are reused as a geometry of measures, giving Wasserstein distances, barycenters, gradient flows, dynamic formulations, and Gaussian/Bures formulas. The final chapters emphasize variants most relevant to modern ML: divergences and adversarial losses, entropic and unbalanced relaxations, robust or spectral ground geometries, Gromov and quantum extensions, and transport-based views of generative models, mean-field networks, and attention dynamics.
| Technique | Purpose in ML |
|---|---|
| Linear programming | Solve assignment problems for discrete distributions |
| Sinkhorn scaling | Efficiently approximate optimal transport with entropic regularization |
| Wasserstein distances | Provide a metric for comparing probability measures |
| Barycenters | Interpolate between multiple distributions |
| Gradient flows | Describe evolution of measures under variational dynamics |
| Entropic relaxations | Smooth transport plans for scalability |
| Gromov-Wasserstein | Transport between spaces of different dimensions |
Relevance to Machine Learning
The book aims to keep the mathematics explicit while exposing the computational and geometric intuitions needed to turn OT into a working toolbox for machine learners. The authors note that optimal transport combines a statistically meaningful notion of discrepancy with a geometry of interpolation, dual certificates, and variational dynamics. This makes OT a common language for many ML tasks, including generative modeling (e.g., Wasserstein GANs), domain adaptation (aligning source and target distributions), and robust learning (handling distribution shift).
Implications for Enterprise AI
For CTOs and technology leaders, understanding optimal transport can enhance AI systems that rely on distribution matching — such as anomaly detection, data augmentation, and fairness auditing. The techniques described in the book are foundational for modern AI architectures, including attention mechanisms and mean-field networks. While the book is mathematical, its emphasis on algorithmic implementations (like Sinkhorn scaling) makes it accessible to practitioners who need to integrate OT into production systems.
The paper is available on arXiv under the current browse context, and includes links to related tools and bibliographic resources. As machine learning models become more complex, a rigorous framework for comparing distributions is increasingly valuable across industries.