Smule Renaissance Small: Efficient General-Purpose Vocal Restoration
By: Yongyi Zang , Chris Manchester , David Young and more
Potential Business Impact:
Cleans up bad singing and talking sounds.
Vocal recordings on consumer devices commonly suffer from multiple concurrent degradations: noise, reverberation, band-limiting, and clipping. We present Smule Renaissance Small (SRS), a compact single-stage model that performs end-to-end vocal restoration directly in the complex STFT domain. By incorporating phase-aware losses, SRS enables large analysis windows for improved frequency resolution while achieving 10.5x real-time inference on iPhone 12 CPU at 48 kHz. On the DNS 5 Challenge blind set, despite no speech training, SRS outperforms a strong GAN baseline and closely matches a computationally expensive flow-matching system. To enable evaluation under realistic multi-degradation scenarios, we introduce the Extreme Degradation Bench (EDB): 87 singing and speech recordings captured under severe acoustic conditions. On EDB, SRS surpasses all open-source baselines on singing and matches commercial systems, while remaining competitive on speech despite no speech-specific training. We release both SRS and EDB under the MIT License.
Similar Papers
MSRBench: A Benchmarking Dataset for Music Source Restoration
Sound
Makes old music sound like it was just recorded.
Summary of The Inaugural Music Source Restoration Challenge
Sound
Cleans up messy music to hear each instrument.
R2-SVC: Towards Real-World Robust and Expressive Zero-shot Singing Voice Conversion
Sound
Makes singing voices sound good even with noise.