Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions
By: Euiyeon Kim, Yong-Hoon Choi
Potential Business Impact:
Cleans songs to hear only the singer.
We introduce a new music source separation model tailored for accurate vocal isolation. Unlike Transformer-based approaches, which often fail to capture intermittently occurring vocals, our model leverages Mamba2, a recent state space model, to better capture long-range temporal dependencies. To handle long input sequences efficiently, we combine a band-splitting strategy with a dual-path architecture. Experiments show that our approach outperforms recent state-of-the-art models, achieving a cSDR of 11.03 dB-the best reported to date-and delivering substantial gains in uSDR. Moreover, the model exhibits stable and consistent performance across varying input lengths and vocal occurrence patterns. These results demonstrate the effectiveness of Mamba-based models for high-resolution audio processing and open up new directions for broader applications in audio research.
Similar Papers
State Space Models for Bioacoustics: A comparative Evaluation with Transformers
Sound
Helps computers identify animal sounds using less power.
Mamba-2 audio captioning: design space exploration and analysis
Sound
Listens to sounds and describes them in words.
SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection
CV and Pattern Recognition
Finds sickness in medical pictures faster.