AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
By: Linya Fu , Yu Liu , Zhijie Liu and more
Potential Business Impact:
Finds sounds in 3D, even when mixed.
We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods
Similar Papers
Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source Segmentation and Localization
Audio and Speech Processing
Pinpoints sounds from all directions accurately
QASTAnet: A DNN-based Quality Metric for Spatial Audio
Audio and Speech Processing
Tests sound quality faster and cheaper.
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Sound
Makes game sounds move with your head.