Score: 0

AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers

Published: June 3, 2025 | arXiv ID: 2506.02773v1

By: Linya Fu , Yu Liu , Zhijie Liu and more

Potential Business Impact:

Finds sounds in 3D, even when mixed.

Business Areas:
Augmented Reality Hardware, Software

We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources. AuralNet employs a gated coarse-tofine architecture, combining a coarse classification stage with a fine-grained regression stage, allowing for flexible spatial resolution through sector partitioning. The model incorporates a multi-head self-attention mechanism to capture spatial cues in binaural signals, enhancing robustness in noisy-reverberant environments. A masked multi-task loss function is designed to jointly optimize sound detection, azimuth, and elevation estimation. Extensive experiments in noisy-reverberant conditions demonstrate the superiority of AuralNet over recent methods

Country of Origin
🇭🇰 Hong Kong

Page Count
5 pages

Category
Electrical Engineering and Systems Science:
Audio and Speech Processing