Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
By: Zhijun Tu , Hanting Chen , Siqi Liu and more
Potential Business Impact:
Makes big AI models smaller, faster, cheaper.
1-bit LLM quantization offers significant advantages in reducing storage and computational costs. However, existing methods typically train 1-bit LLMs from scratch, failing to fully leverage pre-trained models. This results in high training costs and notable accuracy degradation. We identify that the large gap between full precision and 1-bit representations makes direct adaptation difficult. In this paper, we introduce a consistent progressive training for both forward and backward, smoothly converting the floating-point weights into the binarized ones. Additionally, we incorporate binary-aware initialization and dual-scaling compensation to reduce the difficulty of progressive training and improve the performance. Experimental results on LLMs of various sizes demonstrate that our method outperforms existing approaches. Our results show that high-performance 1-bit LLMs can be achieved using pre-trained models, eliminating the need for expensive training from scratch.
Similar Papers
LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi
Machine Learning (CS)
Makes smart computer talk work on small devices.
Binary Quantization For LLMs Through Dynamic Grouping
Machine Learning (CS)
Makes AI models much smaller and faster.
Quantizing Large Language Models for Code Generation: A Differentiated Replication
Software Engineering
Makes big computer brains smaller for coding.