Score: 0

Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

Published: November 26, 2025 | arXiv ID: 2512.03062v1

By: Roman Rausch , David Jansen , Sukhbinder Singh and more

Potential Business Impact:

Makes big computer brains smaller and faster.

Business Areas:

Semantic Web Internet Services

Large Language Models (LLMs) are very demanding in terms of their computational resources. Low-rank decompositions of LLM weights, e.g. via Singular Value Decomposition (SVD), is a promising approach for LLM compression, but presents several practical hurdles, e.g. selecting appropriate layer-wise ranks and getting rid of its parameter redundancy. In this work, we present two physics-inspired improvements to SVD LLM compression: (1) \textbf{FermiGrad}, a gradient-descent algorithm that determines globally optimal layer-wise ranks by relaxing the discrete singular-value truncation into a continuous optimization using the Fermi function; (2) \textbf{PivGa}, an additional \textit{lossless} compression of the low-rank factors that exploits the intrinsic gauge freedom in their parametrization.

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Computation and Language

Makes smart computer programs smaller and faster.

7 Oct 2025 1

89%

SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression

Computation and Language

Makes big AI models smaller without losing smarts.

16 Mar 2025 1

88%

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression

Machine Learning (CS)

Makes big AI models smaller, still smart.

21 Aug 2025 1

View PDF Login to Bookmark

Page Count

6 pages

Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing

Makes big computer brains smaller and faster.

Technical Abstract

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression

CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression