Soft Disentanglement in Frequency Bands for Neural Audio Codecs
By: Benoit Ginies , Xiaoyu Bie , Olivier Fercoq and more
Potential Business Impact:
Makes computer sound understanding clearer and better.
In neural-based audio feature extraction, ensuring that representations capture disentangled information is crucial for model interpretability. However, existing disentanglement methods often rely on assumptions that are highly dependent on data characteristics or specific tasks. In this work, we introduce a generalizable approach for learning disentangled features within a neural architecture. Our method applies spectral decomposition to time-domain signals, followed by a multi-branch audio codec that operates on the decomposed components. Empirical evaluations demonstrate that our approach achieves better reconstruction and perceptual performance compared to a state-of-the-art baseline while also offering potential advantages for inpainting tasks.
Similar Papers
Désentrelacement Fréquentiel Doux pour les Codecs Audio Neuronaux
Sound
Makes computer sound understanding clearer and better.
Harmonic-Percussive Disentangled Neural Audio Codec for Bandwidth Extension
Sound
Makes old recordings sound clear and new.
Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR
Computation and Language
Cleans noisy speech for better understanding.