Robust Residual Finite Scalar Quantization for Neural Compression
By: Xiaoxu Zhu
Potential Business Impact:
Makes pictures clearer when shrunk for computers.
Finite Scalar Quantization (FSQ) has emerged as a promising alternative to Vector Quantization (VQ) in neural compression, offering simplified training and improved stability. However, naive application of FSQ in residual quantization frameworks suffers from the \textbf{residual magnitude decay problem}, where subsequent FSQ layers receive progressively weaker signals, severely limiting their effectiveness. We propose \textbf{Robust Residual Finite Scalar Quantization (RFSQ)}, a general framework that addresses this fundamental limitation through two novel conditioning strategies: learnable scaling factors and invertible layer normalization. Our approach maintains the simplicity of FSQ while enabling effective multi-stage residual quantization. Comprehensive experiments on ImageNet demonstrate that RFSQ variants significantly outperform strong baselines including VQ-EMA, FSQ, and LFQ, achieving up to 45\% improvement in perceptual loss and 28.7\% reduction in L1 reconstruction error. The proposed LayerNorm strategy shows the most consistent improvements across different configurations, establishing RFSQ as a superior quantization method for neural compression.
Similar Papers
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Sound
Makes voices sound clear even with bad internet.
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Sound
Makes audio clear even with bad internet.
SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions
Machine Learning (CS)
Makes AI smaller and faster for phones.