Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation
By: Davide Berghi, Philip J. B. Jackson
Potential Business Impact:
Helps robots hear where sounds are coming from.
Sound event localization and detection (SELD) involves predicting active sound event classes over time while estimating their positions. The localization subtask in SELD is usually treated as a direction of arrival estimation problem, ignoring source distance. Only recently, SELD was extended to 3D by incorporating distance estimation, enabling the prediction of sound event positions in 3D space (3D SELD). However, existing methods lack input features designed for distance estimation. We argue that reverberation encodes valuable information for this task. This paper introduces two novel feature formats for 3D SELD based on reverberation: one using direct-to-reverberant ratio (DRR) and another leveraging signal autocorrelation to provide the model with insights into early reflections. Pre-training on synthetic data improves relative distance error (RDE) and overall SELD score, with autocorrelation-based features reducing RDE by over 3 percentage points on the STARSS23 dataset. The code to extract the features is available at github.com/dberghi/SELD-distance-features.
Similar Papers
An Experimental Study on Joint Modeling for Sound Event Localization and Detection with Source Distance Estimation
Sound
Pinpoints sound location in 3D space.
Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization
Sound
Hears many sounds at once, even overlapping.
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Audio and Speech Processing
Helps computers understand sounds and sights together.