Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
By: Haocheng Luo , Mehrtash Harandi , Dinh Phung and more
Potential Business Impact:
Makes computer learning models work better.
Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. Leveraging an extended Stochastic Differential Equation (SDE) framework, combined with an analysis of the structure of stochastic gradient noise (SGN), we precisely characterize the dynamics of various SAM variants. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM, which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.
Similar Papers
Sharpness-Aware Machine Unlearning
Machine Learning (CS)
Makes AI forget bad data without losing good data.
Sharpness-Aware Minimization: General Analysis and Improved Rates
Optimization and Control
Makes computer learning models work better.
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Machine Learning (CS)
Makes smart computer programs learn better, faster.