Analysing the Language of Neural Audio Codecs
By: Joonyong Park , Shinnosuke Takamichi , David M. Chan and more
Potential Business Impact:
Makes computer speech sound more real.
This study presents a comparative analysis of the statistical and linguistic properties of neural audio codecs (NACs). We investigate discrete speech tokens produced by various NAC models, examining their adherence to linguistic statistical laws such as Zipf's law and Heaps' law, as well as their entropy and redundancy. To assess how these token-level properties relate to semantic and acoustic preservation in synthesized speech, we evaluate intelligibility using error rates of automatic speech recognition, and quality using the UTMOS score. Our results reveal that NAC tokens, particularly 3-grams, exhibit language-like statistical patterns. Moreover, these properties, together with measures of information content, are found to correlate with improved performances in speech recognition and resynthesis tasks. These findings offer insights into the structure of NAC token sequences and inform the design of more effective generative speech models.
Similar Papers
Modeling strategies for speech enhancement in the latent space of a neural audio codec
Sound
Makes noisy speech clear by learning its hidden sounds.
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
Sound
Makes voice checks work better with less data.
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
Sound
Helps computers understand sounds and music better.