Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
By: Ching Ho Lee , Javier Nistal , Stefan Lattner and more
Potential Business Impact:
Changes music's sound without retraining.
We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic and rhythmic structure during reverse diffusion. The method operates directly on audio latents and is compatible with text/audio conditioning (e.g., CLAP). We discuss design choices,analyze trade-offs between timbral change and structural preservation, and show that simple inference-time controls can meaningfully steer pre-trained models for style-transfer use cases.
Similar Papers
Similarity-Guided Diffusion for Long-Gap Music Inpainting
Audio and Speech Processing
Fixes long missing parts in music recordings.
Token-based Audio Inpainting via Discrete Diffusion
Sound
Fixes broken music by filling in missing sounds.
InpDiffusion: Image Inpainting Localization via Conditional Diffusion Models
CV and Pattern Recognition
Finds hidden edits in pictures better.