Score: 0

Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Published: September 22, 2025 | arXiv ID: 2509.18001v1

By: Haocheng Luo , Mehrtash Harandi , Dinh Phung and more

Potential Business Impact:

Makes computer learning models work better.

Business Areas:
A/B Testing Data and Analytics

Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. Leveraging an extended Stochastic Differential Equation (SDE) framework, combined with an analysis of the structure of stochastic gradient noise (SGN), we precisely characterize the dynamics of various SAM variants. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM, which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.

Country of Origin
🇦🇺 Australia

Page Count
36 pages

Category
Computer Science:
Machine Learning (CS)