The Operator Origins of Neural Scaling Laws: A Generalized Spectral Transport Dynamics of Deep Learning
By: Yizhou Zhang
Potential Business Impact:
Makes AI learn faster and better.
Modern deep networks operate in a rough, finite-regularity regime where Jacobian-induced operators exhibit heavy-tailed spectra and strong basis drift. In this work, we derive a unified operator-theoretoretic description of neural training dynamics directly from gradient descent. Starting from the exact evolution $\dot e_t = -M(t)e_t$ in function space, we apply Kato perturbation theory to obtain a rigorous system of coupled mode ODEs and show that, after coarse-graining, these dynamics converge to a spectral transport--dissipation PDE \[ \partial_t g + \partial_λ(v g) = -λg + S, \] where $v$ captures eigenbasis drift and $S$ encodes nonlocal spectral coupling. We prove that neural training preserves functional regularity, forcing the drift to take an asymptotic power-law form $v(λ,t)\sim -c(t)λ^b$. In the weak-coupling regime -- naturally induced by spectral locality and SGD noise -- the PDE admits self-similar solutions with a resolution frontier, polynomial amplitude growth, and power-law dissipation. This structure yields explicit scaling-law exponents, explains the geometry of double descent, and shows that the effective training time satisfies $τ(t)=t^αL(t)$ for slowly varying $L$. Finally, we show that NTK training and feature learning arise as two limits of the same PDE: $v\equiv 0$ recovers lazy dynamics, while $v\neq 0$ produces representation drift. Our results provide a unified spectral framework connecting operator geometry, optimization dynamics, and the universal scaling behavior of modern deep networks.
Similar Papers
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
Machine Learning (CS)
Teaches computers how to draw realistic pictures faster.
Scaling Laws are Redundancy Laws
Machine Learning (CS)
Explains why bigger computer brains learn faster.
Emergence and scaling laws in SGD learning of shallow neural networks
Machine Learning (CS)
Teaches computers to learn complex patterns faster.