Score: 0

Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

Published: August 28, 2025 | arXiv ID: 2508.21106v1

By: Tatyana Matveeva, Aleksandr Katrutsa, Evgeny Frolov

Potential Business Impact:

Makes computer learning faster with smarter math.

Business Areas:

A/B Testing Data and Analytics

Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods, approximating the exact Hessian, can model these correlations and may enable faster convergence. At the same time, their computational and memory costs are often prohibitive for large-scale models. To address this limitation, we propose AdaGram, an optimizer that enables efficient full-matrix adaptive gradient updates. To reduce memory and computational overhead, we utilize fast symmetric factorization for computing the preconditioned update direction at each iteration. Additionally, we maintain the low-rank structure of a preconditioner along the optimization trajectory using matrix integrator methods. Numerical experiments on standard machine learning tasks show that AdaGram converges faster or matches the performance of diagonal adaptive optimizers when using rank five and smaller rank approximations. This demonstrates AdaGram's potential as a scalable solution for adaptive optimization in large models.

Efficient Low-Tubal-Rank Tensor Estimation via Alternating Preconditioned Gradient Descent

Machine Learning (CS)

Makes computer math problems solve much faster.

8 Dec 2025 0

88%

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise

Machine Learning (CS)

Makes computer learning faster and more accurate.

15 Oct 2025 1

87%

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Optimization and Control

Makes computer learning faster when it's too complex.

13 Apr 2025 0

View PDF Login to Bookmark

Page Count

11 pages

Dynamic Low-rank Approximation of Full-Matrix Preconditioner for Training Generalized Linear Models

Makes computer learning faster with smarter math.

Technical Abstract

Efficient Low-Tubal-Rank Tensor Estimation via Alternating Preconditioned Gradient Descent

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization