Score: 2

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

Published: July 1, 2025 | arXiv ID: 2507.00966v1

By: Nikolai Lund Kühne , Jesper Jensen , Jan Østergaard and more

Potential Business Impact:

Cleans up noisy audio for clearer sound.

Business Areas:

A/B Testing Data and Analytics

With the advent of new sequence models like Mamba and xLSTM, several studies have shown that these models match or outperform state-of-the-art models in single-channel speech enhancement, automatic speech recognition, and self-supervised audio representation learning. However, prior research has demonstrated that sequence models like LSTM and Mamba tend to overfit to the training set. To address this issue, previous works have shown that adding self-attention to LSTMs substantially improves generalization performance for single-channel speech enhancement. Nevertheless, neither the concept of hybrid Mamba and time-frequency attention models nor their generalization performance have been explored for speech enhancement. In this paper, we propose a novel hybrid architecture, MambAttention, which combines Mamba and shared time- and frequency-multi-head attention modules for generalizable single-channel speech enhancement. To train our model, we introduce VoiceBank+Demand Extended (VB-DemandEx), a dataset inspired by VoiceBank+Demand but with more challenging noise types and lower signal-to-noise ratios. Trained on VB-DemandEx, our proposed MambAttention model significantly outperforms existing state-of-the-art LSTM-, xLSTM-, Mamba-, and Conformer-based systems of similar complexity across all reported metrics on two out-of-domain datasets: DNS 2020 and EARS-WHAM_v2, while matching their performance on the in-domain dataset VB-DemandEx. Ablation studies highlight the role of weight sharing between the time- and frequency-multi-head attention modules for generalization performance. Finally, we explore integrating the shared time- and frequency-multi-head attention modules with LSTM and xLSTM, which yields a notable performance improvement on the out-of-domain datasets. However, our MambAttention model remains superior on both out-of-domain datasets across all reported evaluation metrics.

Attention Mamba: Time Series Modeling with Adaptive Pooling Acceleration and Receptive Field Enhancements

Machine Learning (CS)

Predicts future events better by seeing more patterns.

2 Apr 2025 0

90%

Differential Mamba

Machine Learning (CS)

Makes AI better at remembering and understanding long stories.

8 Jul 2025 1

90%

SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection

CV and Pattern Recognition

Finds sickness in medical pictures faster.

1 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇰 Denmark

Repos / Data Links

github.com

Page Count

12 pages

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

Cleans up noisy audio for clearer sound.

Technical Abstract

Attention Mamba: Time Series Modeling with Adaptive Pooling Acceleration and Receptive Field Enhancements

Differential Mamba

SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection