Score: 0

SVRG and Beyond via Posterior Correction

Published: December 1, 2025 | arXiv ID: 2512.01930v1

By: Nico Daheim , Thomas Möllenhoff , Ming Liang Ang and more

Potential Business Impact:

Makes AI learn faster and better.

Business Areas:

A/B Testing Data and Analytics

Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections, but have seen limited success in deep learning. Here, we show surprising new foundational connections of SVRG to a recently proposed Bayesian method called posterior correction. Specifically, we show that SVRG is recovered as a special case of posterior correction over the isotropic-Gaussian family, while novel extensions are automatically obtained by using more flexible exponential families. We derive two new SVRG variants by using Gaussian families: First, a Newton-like variant that employs novel Hessian corrections, and second, an Adam-like extension that improves pretraining and finetuning of Transformer language models. This is the first work to connect SVRG to Bayes and use it to boost variational training for deep networks.

On the convergence of stochastic variance reduced gradient for linear inverse problems

Numerical Analysis

Solves hard math problems faster and more accurately.

16 Oct 2025 0

88%

Convergence Analysis of alpha-SVRG under Strong Convexity

Machine Learning (CS)

Makes computer learning faster and better.

16 Mar 2025 0

87%

Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)

Machine Learning (CS)

Shows how sure a computer is about its answers.

4 Dec 2025 0

View PDF Login to Bookmark

Page Count

19 pages

SVRG and Beyond via Posterior Correction

Makes AI learn faster and better.

Technical Abstract

On the convergence of stochastic variance reduced gradient for linear inverse problems

Convergence Analysis of alpha-SVRG under Strong Convexity

Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN)