LOST: Low-rank and Sparse Pre-training for Large Language Models
By: Jiaxi Li , Lu Yin , Li Shen and more
Potential Business Impact:
Makes big computer brains train faster, cheaper.
While large language models (LLMs) have achieved remarkable performance across a wide range of tasks, their massive scale incurs prohibitive computational and memory costs for pre-training from scratch. Recent studies have investigated the use of low-rank parameterization as a means of reducing model size and training cost. In this context, sparsity is often employed as a complementary technique to recover important information lost in low-rank compression by capturing salient features in the residual space. However, existing approaches typically combine low-rank and sparse components in a simplistic or ad hoc manner, often resulting in undesirable performance degradation compared to full-rank training. In this paper, we propose \textbf{LO}w-rank and \textbf{S}parse pre-\textbf{T}raining (\textbf{LOST}) for LLMs, a novel method that ingeniously integrates low-rank and sparse structures to enable effective training of LLMs from scratch under strict efficiency constraints. LOST applies singular value decomposition to weight matrices, preserving the dominant low-rank components, while allocating the remaining singular values to construct channel-wise sparse components to complement the expressiveness of low-rank training. We evaluate LOST on LLM pretraining ranging from 60M to 7B parameters. Our experiments show that LOST achieves competitive or superior performance compared to full-rank models, while significantly reducing both memory and compute overhead. Moreover, Code is available at \href{https://github.com/JiaxiLi1/LOST-Low-rank-and-Sparse-Training-for-Large-Language-Models}{LOST Repo}
Similar Papers
1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models
Computation and Language
Makes big AI models smaller and faster.
Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
Computation and Language
Saves space by storing only important AI changes.
Large Language Model Compression with Global Rank and Sparsity Optimization
Machine Learning (CS)
Makes big computer brains smaller and faster.