Score: 0

Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026

Published: December 5, 2025 | arXiv ID: 2512.06041v1

By: Candy Olivia Mawalim, Haotian Zhang, Shogo Okada

Potential Business Impact:

Detects fake sounds to keep audio real.

Business Areas:
Speech Recognition Data and Analytics, Software

This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by proposing an audio-text cross-attention model. Experiments with individual and combined text-audio models demonstrate competitive EER improvements over the challenge baseline (BEATs+AASIST model).

Page Count
3 pages

Category
Computer Science:
Sound