Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes
By: Udayon Sen, Alka Luqman, Anupam Chattopadhyay
Deepfake audio detection has progressed rapidly with strong pre-trained encoders (e.g., WavLM, Wav2Vec2, MMS). However, performance in realistic capture conditions - background noise (domestic/office/transport), room reverberation, and consumer channels - often lags clean-lab results. We survey and evaluate robustness for state-of-the-art audio deepfake detection models and present a reproducible framework that mixes MS-SNSD noises with ASVspoof 2021 DF utterances to evaluate under controlled signal-to-noise ratios (SNRs). SNR is a measured proxy for noise severity used widely in speech; it lets us sweep from near-clean (35 dB) to very noisy (-5 dB) to quantify graceful degradation. We study multi-condition training and fixed-SNR testing for pretrained encoders (WavLM, Wav2Vec2, MMS), reporting accuracy, ROC-AUC, and EER on binary and four-class (authenticity x corruption) tasks. In our experiments, finetuning reduces EER by 10-15 percentage points at 10-0 dB SNR across backbones.
Similar Papers
SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection
Sound
Finds fake voices by listening to tiny sound details.
Measuring the Robustness of Audio Deepfake Detectors
Cryptography and Security
Finds fake voices even when they are noisy.
SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation
Medical Physics
Makes MRI pictures clearer and easier to see.