Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026
By: Candy Olivia Mawalim, Haotian Zhang, Shogo Okada
Potential Business Impact:
Detects fake sounds to keep audio real.
This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by proposing an audio-text cross-attention model. Experiments with individual and combined text-audio models demonstrate competitive EER improvements over the challenge baseline (BEATs+AASIST model).
Similar Papers
ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Sound
Detects fake sounds in videos and games.
BEAT2AASIST model with layer fusion for ESDD 2026 Challenge
Sound
Detects fake sounds to stop audio tricks.
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Audio and Speech Processing
Finds fake voices in recordings.