Large Language Model Compression with Global Rank and Sparsity Optimization
By: Changhai Zhou , Qian Qiao , Weizhong Zhang and more
Potential Business Impact:
Makes big computer brains smaller and faster.
Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge relates to the interaction and cooperation between low-rank and sparse matrices, while the second involves determining weight allocation across different layers, as redundancy varies considerably among them. To address these challenges, we propose a novel two-stage LLM compression method with the capability of global rank and sparsity optimization. It is noteworthy that the overall optimization space is vast, making comprehensive optimization computationally prohibitive. Therefore, to reduce the optimization space, our first stage utilizes robust principal component analysis to decompose the weight matrices of LLMs into low-rank and sparse components, which span the low dimensional and sparse spaces containing the resultant low-rank and sparse matrices, respectively. In the second stage, we propose a probabilistic global optimization technique to jointly identify the low-rank and sparse structures within the above two spaces. The appealing feature of our approach is its ability to automatically detect the redundancy across different layers and to manage the interaction between the sparse and low-rank components. Extensive experimental results indicate that our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.
Similar Papers
1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models
Computation and Language
Makes big AI models smaller and faster.
Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM
Computation and Language
Makes smart computer programs smaller and faster.
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
CV and Pattern Recognition
Teaches computers new things with very little data.