Harmonic-Percussive Disentangled Neural Audio Codec for Bandwidth Extension
By: Benoît Giniès , Xiaoyu Bie , Olivier Fercoq and more
Potential Business Impact:
Makes old recordings sound clear and new.
Bandwidth extension, the task of reconstructing the high-frequency components of an audio signal from its low-pass counterpart, is a long-standing problem in audio processing. While traditional approaches have evolved alongside the broader trends in signal processing, recent advances in neural architectures have significantly improved performance across a wide range of audio tasks, In this work, we extend these advances by framing bandwidth extension as an audio token prediction problem. Specifically, we train a transformer-based language model on the discrete representations produced by a disentangled neural audio codec, where the disentanglement is guided by a Harmonic-Percussive decomposition of the input signals, highlighting spectral structures particularly relevant for bandwidth extension. Our approach introduces a novel codec design that explicitly accounts for the downstream token prediction task, enabling a more effective coupling between codec structure and transformer modeling. This joint design yields high-quality reconstructions of the original signal, as measured by both objective metrics and subjective evaluations. These results highlight the importance of aligning codec disentanglement and representation learning with the generative modeling stage, and demonstrate the potential of global, representation-aware design for advancing bandwidth extension.
Similar Papers
Soft Disentanglement in Frequency Bands for Neural Audio Codecs
Sound
Makes computer sound understanding clearer and better.
Désentrelacement Fréquentiel Doux pour les Codecs Audio Neuronaux
Sound
Makes computer sound understanding clearer and better.
Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR
Computation and Language
Cleans noisy speech for better understanding.