Score: 1

Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

Published: September 3, 2025 | arXiv ID: 2509.03378v1

By: Wu Lin , Scott C. Lowe , Felix Dangel and more

Potential Business Impact:

Makes computer learning faster and use less memory.

Business Areas:

Personalization Commerce and Shopping

As an adaptive method, Shampoo employs a structured second-moment estimation, and its effectiveness has attracted growing attention. Prior work has primarily analyzed its estimation scheme through the Frobenius norm. Motivated by the natural connection between the second moment and a covariance matrix, we propose studying Shampoo's estimation as covariance estimation through the lens of Kullback-Leibler (KL) minimization. This alternative perspective reveals a previously hidden limitation, motivating improvements to Shampoo's design. Building on this insight, we develop a practical estimation scheme, termed KL-Shampoo, that eliminates Shampoo's reliance on Adam for stabilization, thereby removing the additional memory overhead introduced by Adam. Preliminary results show that KL-Shampoo improves Shampoo's performance, enabling it to stabilize without Adam and even outperform its Adam-stabilized variant, SOAP, in neural network pretraining.

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner

Machine Learning (CS)

Makes computer learning faster and simpler.

4 Jun 2025 2

87%

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Optimization and Control

Makes computer learning faster and better.

12 Jan 2026 0

86%

A Stable Whitening Optimizer for Efficient Neural Network Training

Machine Learning (CS)

Makes computer learning faster and more stable.

8 Jun 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇦 🇬🇧 United Kingdom, Canada

Page Count

10 pages

Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization

Makes computer learning faster and use less memory.

Technical Abstract

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

A Stable Whitening Optimizer for Efficient Neural Network Training