Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
By: Parsa Rangriz
Potential Business Impact:
Makes AI learn faster by understanding noise.
This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of the population loss, we focus on the critical scaling regime of the step size. Below this critical scale, the effective dynamics are governed by ballistic (ODE) limits, but at the critical scale, new correction term appears that changes the phase diagram. In this regime, near the fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduces to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrates the limitations of deterministic scaling limit in capturing the stochastic fluctuations of high-dimensional learning dynamics.
Similar Papers
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Machine Learning (Stat)
Improves computer learning by making it more stable.
Exact Dynamics of Multi-class Stochastic Gradient Descent
Machine Learning (Stat)
Helps computers learn better from messy data.
Emergence and scaling laws in SGD learning of shallow neural networks
Machine Learning (CS)
Teaches computers to learn complex patterns faster.