PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios
By: Zixiang Wan , Haoran Zhao , Guochang Zhang and more
Potential Business Impact:
Makes phone calls clear with very little internet.
This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to escape local optima, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and reverberation and intelligibility in clean tests, confirming its effectiveness.
Similar Papers
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Sound
Makes talking computers understand speech better.
NanoCodec: Towards High-Quality Ultra Fast Speech LLM Inference
Audio and Speech Processing
Makes AI understand voices much faster.
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
Sound
Makes voices sound real with less data.