Score: 1

Semantic Codebooks as Effective Priors for Neural Speech Compression

Published: December 25, 2025 | arXiv ID: 2512.21653v1

By: Liuyang Bai, Weiyi Lu, Li Guo

Potential Business Impact:

Makes voices sound clear with fewer data.

Business Areas:

Semantic Web Internet Services

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on downstream recognition tasks. We propose SemDAC, a semantic-aware neural audio codec that leverages semantic codebooks as effective priors for speech compression. In SemDAC, the first quantizer in a residual vector quantization (RVQ) stack is distilled from HuBERT features to produce semantic tokens that capture phonetic content, while subsequent quantizers model residual acoustics. A FiLM-conditioned decoder reconstructs audio conditioned on the semantic tokens, improving efficiency in the use of acoustic codebooks. Despite its simplicity, this design proves highly effective: SemDAC outperforms DAC across perceptual metrics and achieves lower WER when running Whisper on reconstructed speech, all while operating at substantially lower bitrates (e.g., 0.95 kbps vs. 2.5 kbps for DAC). These results demonstrate that semantic codebooks provide an effective inductive bias for neural speech compression, producing compact yet recognition-friendly representations.

SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs

Sound

Makes AI understand speech better, even with less data.

24 Dec 2025 2

91%

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Audio and Speech Processing

Makes computers understand and create speech better.

19 Oct 2025 2

90%

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine

Sound

Makes computers understand speech with less data.

17 Jul 2025 0

View PDF Login to Bookmark

Page Count

5 pages

Semantic Codebooks as Effective Priors for Neural Speech Compression

Makes voices sound clear with fewer data.

Technical Abstract

SACodec: Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine