Sparse deepfake detection promotes better disentanglement
By: Antoine Teissier , Marie Tahon , Nicolas Dugué and more
Potential Business Impact:
Finds fake voices by looking at hidden sound patterns.
Due to the rapid progress of speech synthesis, deepfake detection has become a major concern in the speech processing community. Because it is a critical task, systems must not only be efficient and robust, but also provide interpretable explanations. Among the different approaches for explainability, we focus on the interpretation of latent representations. In such paper, we focus on the last layer of embeddings of AASIST, a deepfake detection architecture. We use a TopK activation inspired by SAEs on this layer to obtain sparse representations which are used in the decision process. We demonstrate that sparse deepfake detection can improve detection performance, with an EER of 23.36% on ASVSpoof5 test set, with 95% of sparsity. We then show that these representations provide better disentanglement, using completeness and modularity metrics based on mutual information. Notably, some attacks are directly encoded in the latent space.
Similar Papers
The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
CV and Pattern Recognition
Shows how fake videos are spotted.
Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
Sound
Stops fake voices from tricking voice security.
Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection
Sound
Finds fake voices in recordings better.