Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
By: Masahiro Yasuda , Binh Thien Nguyen , Noboru Harada and more
Potential Business Impact:
Lets computers hear and place sounds in 3D.
Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. This is a fundamental basis of immersive communication. The ultimate goal is to separate sound event signals with 6 Degrees of Freedom (6DoF) information into dry sound object signals and metadata about the object type (sound event class) and representing spatial information, including direction. However, because several existing challenge tasks already provide some of the subset functions, this task for this year focuses on detecting and separating sound events from multi-channel spatial input signals. This paper outlines the S5 task setting of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 Challenge Task 4 and the DCASE2025 Task 4 Dataset, newly recorded and curated for this task. We also report experimental results for an S5 system trained and evaluated on this dataset. The full version of this paper will be published after the challenge results are made public.
Similar Papers
Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
Sound
Better measures how well computers hear sounds.
Description and Discussion on DCASE 2025 Challenge Task 2: First-shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
Sound
Find broken machines by listening for strange sounds.
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Audio and Speech Processing
Helps computers understand sounds and sights together.