A Robust framework for sound event localization and detection on real recordings
By: Jin Sob Kim , Hyun Joon Park , Wooseok Shin and more
Potential Business Impact:
Finds sounds and where they come from.
This technical report describes the systems submitted to the DCASE2022 challenge task 3: sound event localization and detection (SELD). The task aims to detect occurrences of sound events and specify their class, furthermore estimate their position. Our system utilizes a ResNet-based model under a proposed robust framework for SELD. To guarantee the generalized performance on the real-world sound scenes, we design the total framework with augmentation techniques, a pipeline of mixing datasets from real-world sound scenes and emulations, and test time augmentation. Augmentation techniques and exploitation of external sound sources enable training diverse samples and keeping the opportunity to train the real-world context enough by maintaining the number of the real recording samples in the batch. In addition, we design a test time augmentation and a clustering-based model ensemble method to aggregate confident predictions. Experimental results show that the model under a proposed framework outperforms the baseline methods and achieves competitive performance in real-world sound recordings.
Similar Papers
Resnet-conformer network with shared weights and attention mechanism for sound event localization, detection, and distance estimation
Sound
Helps computers pinpoint sounds in noisy places.
A Two-Step Learning Framework for Enhancing Sound Event Localization and Detection
Sound
Finds where sounds come from in 3D.
Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos
Audio and Speech Processing
Finds sounds and their direction in videos.