A Spatial-Frequency Aware Multi-Scale Fusion Network for Real-Time Deepfake Detection
By: Libo Lv , Tianyi Wang , Mengxiao Huang and more
Potential Business Impact:
Finds fake videos fast, even on phones.
With the rapid advancement of real-time deepfake generation techniques, forged content is becoming increasingly realistic and widespread across applications like video conferencing and social media. Although state-of-the-art detectors achieve high accuracy on standard benchmarks, their heavy computational cost hinders real-time deployment in practical applications. To address this, we propose the Spatial-Frequency Aware Multi-Scale Fusion Network (SFMFNet), a lightweight yet effective architecture for real-time deepfake detection. We design a spatial-frequency hybrid aware module that jointly leverages spatial textures and frequency artifacts through a gated mechanism, enhancing sensitivity to subtle manipulations. A token-selective cross attention mechanism enables efficient multi-level feature interaction, while a residual-enhanced blur pooling structure helps retain key semantic cues during downsampling. Experiments on several benchmark datasets show that SFMFNet achieves a favorable balance between accuracy and efficiency, with strong generalization and practical value for real-time applications.
Similar Papers
Towards Generalizable Deepfake Detection with Spatial-Frequency Collaborative Learning and Hierarchical Cross-Modal Fusion
CV and Pattern Recognition
Finds fake videos better, even new kinds.
MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching
CV and Pattern Recognition
Makes 3D vision work fast on phones.
SFANet: Spatial-Frequency Attention Network for Deepfake Detection
CV and Pattern Recognition
Finds fake videos better than before.