Improving Test-Time Performance of RVQ-based Neural Codecs
By: Hyeongju Kim , Junhyeok Lee , Jacob Morton and more
Potential Business Impact:
Makes music sound better from fewer computer codes.
The residual vector quantization (RVQ) technique plays a central role in recent advances in neural audio codecs. These models effectively synthesize high-fidelity audio from a limited number of codes due to the hierarchical structure among quantization levels. In this paper, we propose an encoding algorithm to further enhance the synthesis quality of RVQ-based neural codecs at test-time. Firstly, we point out the suboptimal nature of quantized vectors generated by conventional methods. We demonstrate that quantization error can be mitigated by selecting a different set of codes. Subsequently, we present our encoding algorithm, designed to identify a set of discrete codes that achieve a lower quantization error. We then apply the proposed method to pre-trained models and evaluate its efficacy using diverse metrics. Our experimental findings validate that our method not only reduces quantization errors, but also improves synthesis quality.
Similar Papers
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Sound
Makes online calls sound clearer, faster, and cheaper.
MBCodec:Thorough disentangle for high-fidelity audio compression
Sound
Makes computer voices sound more real.
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Sound
Makes voices sound clear even with bad internet.