Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
By: Geonhui Yoo, Minhak Song, Chulhee Yun
Potential Business Impact:
Makes computer brains learn better and faster.
When training deep neural networks with gradient descent, sharpness often increases -- a phenomenon known as progressive sharpening -- before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.
Similar Papers
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Machine Learning (CS)
Teaches computers to learn faster and more reliably.
Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization
Machine Learning (CS)
Makes computer learning faster and more accurate.
From Sharpness to Better Generalization for Speech Deepfake Detection
Audio and Speech Processing
Makes fake voice detectors work on new voices.