Optimal and Diffusion Transports in Machine Learning
By: Gabriel Peyré
Potential Business Impact:
Makes AI learn and create better using math.
Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, and analyzing the evolution of token distributions across layers of large language models. While the targeted applications differ (samples, weights, tokens), their mathematical descriptions share a common structure. A key idea is to switch from the Eulerian representation of densities to their Lagrangian counterpart through vector fields that advect particles. This dual view introduces challenges, notably the non-uniqueness of Lagrangian vector fields, but also opportunities to craft density evolutions and flows with favorable properties in terms of regularity, stability, and computational tractability. This survey presents an overview of these methods, with emphasis on two complementary approaches: diffusion methods, which rely on stochastic interpolation processes and underpin modern generative AI, and optimal transport, which defines interpolation by minimizing displacement cost. We illustrate how both approaches appear in applications ranging from sampling, neural network optimization, to modeling the dynamics of transformers for large language models.
Similar Papers
Optimal Transport for Machine Learners
Machine Learning (Stat)
Helps computers learn by comparing data.
Control, Optimal Transport and Neural Differential Equations in Supervised Learning
Numerical Analysis
Teaches computers to move data smoothly and efficiently.
The Principles of Diffusion Models
Machine Learning (CS)
Creates new pictures and sounds from noise.