Score: 0

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

Published: June 13, 2025 | arXiv ID: 2506.11904v2

By: Mathukumalli Vidyasagar

Potential Business Impact:

Makes computer learning faster and more accurate.

Business Areas:

A/B Testing Data and Analytics

In this paper, we present a unified algorithm for stochastic optimization that makes use of a "momentum" term; in other words, the stochastic gradient depends not only on the current true gradient of the objective function, but also on the true gradient at the previous iteration. Our formulation includes the Stochastic Heavy Ball (SHB) and the Stochastic Nesterov Accelerated Gradient (SNAG) algorithms as special cases. In addition, in our formulation, the momentum term is allowed to vary as a function of time (i.e., the iteration counter). The assumptions on the stochastic gradient are the most general in the literature, in that it can be biased, and have a conditional variance that grows in an unbounded fashion as a function of time. This last feature is crucial in order to make the theory applicable to "zero-order" methods, where the gradient is estimated using just two function evaluations. We present a set of sufficient conditions for the convergence of the unified algorithm. These conditions are natural generalizations of the familiar Robbins-Monro and Kiefer-Wolfowitz-Blum conditions for standard stochastic gradient descent. We also analyze another method from the literature for the SHB algorithm with a time-varying momentum parameter, and show that it is impracticable.

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Machine Learning (CS)

Makes computer learning faster with changing steps.

18 Apr 2025 0

88%

Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation

Machine Learning (CS)

Helps computers learn faster on tricky problems.

10 Dec 2025 0

87%

Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise

Optimization and Control

Makes computer learning faster and better.

12 Jun 2025 2

View PDF Login to Bookmark

Page Count

36 pages

Convergence of Momentum-Based Optimization Algorithms with Time-Varying Parameters

Makes computer learning faster and more accurate.

Technical Abstract

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation

Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise