PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
By: Jiatong Shi , Haoran Wang , William Chen and more
Potential Business Impact:
Makes phone calls sound clearer, even with background noise.
Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.
Similar Papers
MBCodec:Thorough disentangle for high-fidelity audio compression
Sound
Makes computer voices sound more real.
Improving Test-Time Performance of RVQ-based Neural Codecs
Audio and Speech Processing
Makes music sound better from fewer computer codes.
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Sound
Makes online calls sound clearer, faster, and cheaper.