Brainprint-Modulated Target Speaker Extraction
By: Qiushi Han , Yuan Liao , Youhao Si and more
Potential Business Impact:
Helps hearing aids focus on one voice.
Achieving robust and personalized performance in neuro-steered Target Speaker Extraction (TSE) remains a significant challenge for next-generation hearing aids. This is primarily due to two factors: the inherent non-stationarity of EEG signals across sessions, and the high inter-subject variability that limits the efficacy of generalized models. To address these issues, we propose Brainprint-Modulated Target Speaker Extraction (BM-TSE), a novel framework for personalized and high-fidelity extraction. BM-TSE first employs a spatio-temporal EEG encoder with an Adaptive Spectral Gain (ASG) module to extract stable features resilient to non-stationarity. The core of our framework is a personalized modulation mechanism, where a unified brainmap embedding is learned under the joint supervision of subject identification (SID) and auditory attention decoding (AAD) tasks. This learned brainmap, encoding both static user traits and dynamic attentional states, actively refines the audio separation process, dynamically tailoring the output to each user. Evaluations on the public KUL and Cocktail Party datasets demonstrate that BM-TSE achieves state-of-the-art performance, significantly outperforming existing methods. Our code is publicly accessible at: https://github.com/rosshan-orz/BM-TSE.
Similar Papers
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction
Audio and Speech Processing
Helps computers hear one voice in a crowd.
A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding
Human-Computer Interaction
Lets paralyzed people talk by reading brain waves.
Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
Audio and Speech Processing
Helps computers focus on one voice in noisy rooms.