Forensic Similarity for Speech Deepfakes
By: Viola Negroni , Davide Salvi , Daniele Ugo Leonzio and more
Potential Business Impact:
Finds fake voices by matching sound clues.
In this paper, we introduce a digital audio forensics approach called Forensic Similarity for Speech Deepfakes, which determines whether two audio segments contain the same forensic traces or not. Our work is inspired by prior work in the image domain on forensic similarity, which proved strong generalization capabilities against unknown forensic traces, without requiring prior knowledge of them at training time. To achieve this in the audio setting, we propose a two-part deep-learning system composed of a feature extractor based on a speech deepfake detector backbone and a shallow neural network, referred to as the similarity network. This system maps pairs of audio segments to a score indicating whether they contain the same or different forensic traces. We evaluate the system on the emerging task of source verification, highlighting its ability to identify whether two samples originate from the same generative model. Additionally, we assess its applicability to splicing detection as a complementary use case. Experiments show that the method generalizes to a wide range of forensic traces, including previously unseen ones, illustrating its flexibility and practical value in digital audio forensics.
Similar Papers
Source Verification for Speech Deepfakes
Sound
Finds who made fake voices.
Forensic deepfake audio detection using segmental speech features
Sound
Finds fake voices by listening to how sounds are made.
SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection
CV and Pattern Recognition
Spots fake videos by listening to voices.