A Minimalist Example of Edge-of-Stability and Progressive Sharpening
By: Liming Liu , Zixuan Zhang , Simon Du and more
Potential Business Impact:
Teaches computers to learn faster and more reliably.
Recent advances in deep learning optimization have unveiled two intriguing phenomena under large learning rates: Edge of Stability (EoS) and Progressive Sharpening (PS), challenging classical Gradient Descent (GD) analyses. Current research approaches, using either generalist frameworks or minimalist examples, face significant limitations in explaining these phenomena. This paper advances the minimalist approach by introducing a two-layer network with a two-dimensional input, where one dimension is relevant to the response and the other is irrelevant. Through this model, we rigorously prove the existence of progressive sharpening and self-stabilization under large learning rates, and establish non-asymptotic analysis of the training dynamics and sharpness along the entire GD trajectory. Besides, we connect our minimalist example to existing works by reconciling the existence of a well-behaved ``stable set" between minimalist and generalist analyses, and extending the analysis of Gradient Flow Solution sharpness to our two-dimensional input scenario. These findings provide new insights into the EoS phenomenon from both parameter and input data distribution perspectives, potentially informing more effective optimization strategies in deep learning practice.
Similar Papers
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Machine Learning (CS)
Makes computer brains learn better and faster.
Variational Learning Finds Flatter Solutions at the Edge of Stability
Machine Learning (Stat)
Makes computer learning find simpler, better answers.
Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares
Machine Learning (CS)
Helps computers learn faster by finding better solutions.