Score: 0

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Published: December 8, 2025 | arXiv ID: 2512.07112v1

By: Ziqing Wen , Jiahuan Wang , Ping Luo and more

Potential Business Impact:

Makes AI training use half the computer memory.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottlenecks during training, especially when using memory-intensive optimizers like Adam. Existing memory-efficient approaches often rely on techniques such as singular value decomposition (SVD), projections, or weight freezing, which can introduce substantial computational overhead, require additional memory for projections, or degrade model performance. In this paper, we propose Folded Optimizer with Approximate Moment (FOAM), a method that compresses optimizer states by computing block-wise gradient means and incorporates a residual correction to recover lost information. Theoretically, FOAM achieves convergence rates equivalent to vanilla Adam under standard non-convex optimization settings. Empirically, FOAM reduces total training memory by approximately 50\%, eliminates up to 90\% of optimizer state memory overhead, and accelerates convergence. Furthermore, FOAM is compatible with other memory-efficient optimizers, delivering performance and throughput that match or surpass both full-rank and existing memory-efficient baselines.

Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints

Machine Learning (CS)

Trains big AI models with less computer memory.

26 Oct 2025 0

86%

Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics

Machine Learning (CS)

Saves computer memory for faster AI training.

1 May 2025 1

85%

Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference

Machine Learning (CS)

Makes AI models use less computer memory.

12 Nov 2025 0

View PDF Login to Bookmark

Page Count

25 pages

FOAM: Blocked State Folding for Memory-Efficient LLM Training

Makes AI training use half the computer memory.

Technical Abstract

Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints

Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics

Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference