Score: 2

WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection

Published: October 6, 2025 | arXiv ID: 2510.05305v1

By: Xi Xuan , Xuechen Liu , Wenxin Zhang and more

Potential Business Impact:

Finds fake voices better with less computer power.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Modern front-end design for speech deepfake detection relies on full fine-tuning of large pre-trained models like XLSR. However, this approach is not parameter-efficient and may lead to suboptimal generalization to realistic, in-the-wild data types. To address these limitations, we introduce a new family of parameter-efficient front-ends that fuse prompt-tuning with classical signal processing transforms. These include FourierPT-XLSR, which uses the Fourier Transform, and two variants based on the Wavelet Transform: WSPT-XLSR and Partial-WSPT-XLSR. We further propose WaveSP-Net, a novel architecture combining a Partial-WSPT-XLSR front-end and a bidirectional Mamba-based back-end. This design injects multi-resolution features into the prompt embeddings, which enhances the localization of subtle synthetic artifacts without altering the frozen XLSR parameters. Experimental results demonstrate that WaveSP-Net outperforms several state-of-the-art models on two new and challenging benchmarks, Deepfake-Eval-2024 and SpoofCeleb, with low trainable parameters and notable performance gains. The code and models are available at https://github.com/xxuan-acoustics/WaveSP-Net.

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

Sound

Finds fake voices in speech, sounds, and music.

9 Apr 2025 1

86%

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition

Sound

Lets computers understand your feelings from your voice.

1 Feb 2025 2

86%

End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation

Sound

Finds fake voices in recordings.

29 Apr 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

5 pages

WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection

Finds fake voices better with less computer power.

Technical Abstract

Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition

End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation