FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
By: Zeyu Xie , Yaoyun Zhang , Xuenan Xu and more
Potential Business Impact:
Finds fake voices in recordings.
The rapid development of generative audio raises ethical and security concerns stemming from forged data, making deepfake sound detection an important safeguard against the malicious use of such technologies. Although prior studies have explored this task, existing methods largely focus on binary classification and fall short in explaining how manipulations occur, tracing where the sources originated, or generalizing to unseen sources-thereby limiting the explainability and reliability of detection. To address these limitations, we present FakeSound2, a benchmark designed to advance deepfake sound detection beyond binary accuracy. FakeSound2 evaluates models across three dimensions: localization, traceability, and generalization, covering 6 manipulation types and 12 diverse sources. Experimental results show that although current systems achieve high classification accuracy, they struggle to recognize forged pattern distributions and provide reliable explanations. By highlighting these gaps, FakeSound2 establishes a comprehensive benchmark that reveals key challenges and aims to foster robust, explainable, and generalizable approaches for trustworthy audio authentication.
Similar Papers
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Multimedia
Finds fake videos and sounds better.
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Sound
Finds fake voices that trick sound detectors.
Beyond Identity: A Generalizable Approach for Deepfake Audio Detection
Sound
Finds fake voices by ignoring who is speaking.