ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection
By: Hao Gu , Jiangyan Yi , Chenglong Wang and more
Potential Business Impact:
Finds fake voices in audio recordings.
Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio large language models (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: \textit{Can ALLMs be leveraged to solve ADD?}. In this paper, we first conduct a comprehensive zero-shot evaluation of ALLMs on ADD, revealing their ineffectiveness. To this end, we propose ALLM4ADD, an ALLM-driven framework for ADD. Specifically, we reformulate ADD task as an audio question answering problem, prompting the model with the question: ``Is this audio fake or real?''. We then perform supervised fine-tuning to enable the ALLM to assess the authenticity of query audio. Extensive experiments are conducted to demonstrate that our ALLM-based method can achieve superior performance in fake audio detection, particularly in data-scarce scenarios. As a pioneering study, we anticipate that this work will inspire the research community to leverage ALLMs to develop more effective ADD systems. Code is available at https://github.com/ucas-hao/qwen_audio_for_add.git
Similar Papers
DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components
Sound
Finds fake voices better by improving AI.
Can Audio Large Language Models Verify Speaker Identity?
Sound
Lets computers know who is talking.
Probing Audio-Generation Capabilities of Text-Based Language Models
Sound
Computers learn to make sounds from words.