Score: 1

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Published: October 2, 2025 | arXiv ID: 2510.01903v2

By: Jingyi Li , Zhiyuan Zhao , Yunfei Liu and more

Potential Business Impact:

Makes music and speech sound clear with less data.

Business Areas:

Audio Media and Entertainment, Music and Audio

Neural audio codecs have recently emerged as powerful tools for high-quality and low-bitrate audio compression, leveraging deep generative models to learn latent representations of audio signals. However, existing approaches either rely on a single quantizer that only processes speech domain, or on multiple quantizers that are not well suited for downstream tasks. To address this issue, we propose MelCap, a unified "one-codebook-for-all" neural codec that effectively handles speech, music, and general sound. By decomposing audio reconstruction into two stages, our method preserves more acoustic details than previous single-codebook approaches, while achieving performance comparable to mainstream multi-codebook methods. In the first stage, audio is transformed into mel-spectrograms, which are compressed and quantized into compact single tokens using a 2D tokenizer. A perceptual loss is further applied to mitigate the over-smoothing artifacts observed in spectrogram reconstruction. In the second stage, a Vocoder recovers waveforms from the mel discrete tokens in a single forward pass, enabling real-time decoding. Both objective and subjective evaluations demonstrate that MelCap achieves quality on comparable to state-of-the-art multi-codebook codecs, while retaining the computational simplicity of a single-codebook design, thereby providing an effective representation for downstream tasks.

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Sound

Makes music and speech sound clear with less data.

2 Oct 2025 1

88%

MBCodec:Thorough disentangle for high-fidelity audio compression

Sound

Makes computer voices sound more real.

21 Sep 2025 0

88%

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs

Audio and Speech Processing

Makes computers understand speech better, faster, smaller.

20 Nov 2025 0

View PDF Login to Bookmark

Page Count

14 pages

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

Makes music and speech sound clear with less data.

Technical Abstract

MelCap: A Unified Single-Codebook Neural Codec for High-Fidelity Audio Compression

MBCodec:Thorough disentangle for high-fidelity audio compression

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs