Score: 1

Sound Event Detection with Boundary-Aware Optimization and Inference

Published: January 7, 2026 | arXiv ID: 2601.04178v1

By: Florian Schmid , Chi Ian Tang , Sanjeel Parekh and more

Potential Business Impact:

Finds exact start and end of sounds.

Business Areas:
Speech Recognition Data and Analytics, Software

Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.

Page Count
5 pages

Category
Electrical Engineering and Systems Science:
Audio and Speech Processing