Token-based Audio Inpainting via Discrete Diffusion
By: Tali Dror , Iftach Shoham , Moshe Buchris and more
Potential Business Impact:
Fixes broken music by filling in missing sounds.
Audio inpainting refers to the task of reconstructing missing segments in corrupted audio recordings. While prior approaches-including waveform and spectrogram-based diffusion models-have shown promising results for short gaps, they often degrade in quality when gaps exceed 100 milliseconds (ms). In this work, we introduce a novel inpainting method based on discrete diffusion modeling, which operates over tokenized audio representations produced by a pre-trained audio tokenizer. Our approach models the generative process directly in the discrete latent space, enabling stable and semantically coherent reconstruction of missing audio. We evaluate the method on the MusicNet dataset using both objective and perceptual metrics across gap durations up to 300 ms. We further evaluated our approach on the MTG dataset, extending the gap duration to 500 ms. Experimental results demonstrate that our method achieves competitive or superior performance compared to existing baselines, particularly for longer gaps, offering a robust solution for restoring degraded musical recordings. Audio examples of our proposed method can be found at https://iftach21.github.io/
Similar Papers
Similarity-Guided Diffusion for Long-Gap Music Inpainting
Audio and Speech Processing
Fixes long missing parts in music recordings.
Transient Noise Removal via Diffusion-based Speech Inpainting
Audio and Speech Processing
Fixes broken or missing speech in recordings.
Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting
CV and Pattern Recognition
Fixes broken pictures by filling in missing parts.