Sharpness-Aware Minimization with Z-Score Gradient Filtering
By: Vincent-Daniel Yun
Potential Business Impact:
Filters out bad data to make AI smarter.
Deep neural networks achieve high performance across many domains but can still face challenges in generalization when optimization is influenced by small or noisy gradient components. Sharpness-Aware Minimization improves generalization by perturbing parameters toward directions of high curvature, but it uses the entire gradient vector, which means that small or noisy components may affect the ascent step and cause the optimizer to miss optimal solutions. We propose Z-Score Filtered Sharpness-Aware Minimization, which applies Z-score based filtering to gradients in each layer. Instead of using all gradient components, a mask is constructed to retain only the top percentile with the largest absolute Z-scores. The percentile threshold $Q_p$ determines how many components are kept, so that the ascent step focuses on directions that stand out most compared to the average of the layer. This selective perturbation refines the search toward flatter minima while reducing the influence of less significant gradients. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet with architectures including ResNet, VGG, and Vision Transformers show that the proposed method consistently improves test accuracy compared to Sharpness-Aware Minimization and its variants.
Similar Papers
Sharpness-Aware Data Generation for Zero-shot Quantization
Machine Learning (CS)
Makes AI learn better without seeing real examples.
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Machine Learning (CS)
Makes computer learning models work better.
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
CV and Pattern Recognition
Makes AI smarter with less computer power.