The Fourth State: Signed-Zero Ternary for Stable LLM Quantization (and More)
By: Jeffrey Uhlmann
Potential Business Impact:
Makes computer brains work faster with less power.
Quantization is usually regarded as a means to trade quality of performance for reduced compute requirements, i.e., as a suboptimal approximation. However, if examined in terms of a fixed overall resource budget, a very different perspective arises. We introduce Signed-Zero Ternary (SZT), a 2-bit quantization that deterministically provides gradient information with no forward-path penalty. Our analysis provides evidence that it may improve information density compared to non-quantized alternatives.
Similar Papers
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
Machine Learning (CS)
Makes big computer brains work faster, smaller.
Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification
Machine Learning (CS)
Makes big AI models fit on small phones.
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Machine Learning (CS)
Makes computer models smaller, faster, and smarter.