SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs
By: Zhongren Dong , Bin Wang , Jing Han and more
Neural Speech Codecs face a fundamental trade-off at low bitrates: preserving acoustic fidelity often compromises semantic richness. To address this, we introduce SACodec, a novel codec built upon an asymmetric dual-quantizer that employs our proposed Semantic Anchoring mechanism. This design strategically decouples the quantization of Semantic and Acoustic details. The semantic anchoring is achieved via a lightweight projector that aligns acoustic features with a frozen, large-scale mHuBERT codebook, injecting linguistic priors while guaranteeing full codebook utilization. Sequentially, for acoustic details, a residual activation module with SimVQ enables a single-layer quantizer (acoustic path) to faithfully recover fine-grained information. At just 1.5 kbps, SACodec establishes a new state of the art by excelling in both fidelity and semantics: subjective listening tests confirm that its reconstruction quality is perceptually highly comparable to ground-truth audio, while its tokens demonstrate substantially improved semantic richness in downstream tasks.
Similar Papers
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
Audio and Speech Processing
Makes computers understand and create speech better.
SecoustiCodec: Cross-Modal Aligned Streaming Single-Codecbook Speech Codec
Audio and Speech Processing
Makes computers understand and recreate speech better.
Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine
Sound
Makes computers understand speech with less data.