RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
By: Liting Gao , Yi Yuan , Yaru Chen and more
Potential Business Impact:
Changes sounds in audio using just words.
Diffusion models have shown remarkable progress in text-to-audio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics.
Similar Papers
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
Sound
Changes real music with just words.
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
Sound
Changes real music with just words.
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
Sound
Edit any song using just words.