Score: 0

Stochastic Difference-of-Convex Optimization with Momentum

Published: October 20, 2025 | arXiv ID: 2510.17503v1

By: El Mahdi Chayti, Martin Jaggi

Potential Business Impact:

Makes computer learning work with smaller groups.

Business Areas:

A/B Testing Data and Analytics

Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

Machine Learning (CS)

Makes computers learn faster with less data.

7 Aug 2025 3

88%

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Machine Learning (CS)

Makes computer learning faster with changing steps.

18 Apr 2025 0

88%

Better LMO-based Momentum Methods with Second-Order Information

Optimization and Control

Makes computer learning faster and better.

15 Dec 2025 1

View PDF Login to Bookmark

Page Count

19 pages

Stochastic Difference-of-Convex Optimization with Momentum

Makes computer learning work with smaller groups.

Technical Abstract

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

Better LMO-based Momentum Methods with Second-Order Information