Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation
By: Sarwan Ali
Potential Business Impact:
Helps computers learn faster on tricky problems.
Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We propose Heavy-Ball Synthetic Gradient Extrapolation (HB-SGE), a robust first-order method that combines heavy-ball momentum with predictive gradient extrapolation. Unlike classical momentum methods that accumulate historical gradients, HB-SGE estimates future gradient directions using local Taylor approximations, providing adaptive acceleration while maintaining stability. We prove convergence guarantees for strongly convex functions and demonstrate empirically that HB-SGE prevents divergence on problems where NAG and standard momentum fail. On ill-conditioned quadratics (condition number $κ=50$), HB-SGE converges in 119 iterations while both SGD and NAG diverge. On the non-convex Rosenbrock function, HB-SGE achieves convergence in 2,718 iterations where classical momentum methods diverge within 10 steps. While NAG remains faster on well-conditioned problems, HB-SGE provides a robust alternative with speedup over SGD across diverse landscapes, requiring only $O(d)$ memory overhead and the same hyperparameters as standard momentum.
Similar Papers
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
Machine Learning (CS)
Makes computer learning faster and more accurate.
Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters
Optimization and Control
Makes computer learning faster and more accurate.
Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods
Machine Learning (CS)
Makes private AI learn better and more reliably.