Context-aware TFL: A Universal Context-aware Contrastive Learning Framework for Temporal Forgery Localization
By: Qilin Yin , Wei Lu , Xiangyang Luo and more
Potential Business Impact:
Finds fake parts in videos.
Most research efforts in the multimedia forensics domain have focused on detecting forgery audio-visual content and reached sound achievements. However, these works only consider deepfake detection as a classification task and ignore the case where partial segments of the video are tampered with. Temporal forgery localization (TFL) of small fake audio-visual clips embedded in real videos is still challenging and more in line with realistic application scenarios. To resolve this issue, we propose a universal context-aware contrastive learning framework (UniCaCLF) for TFL. Our approach leverages supervised contrastive learning to discover and identify forged instants by means of anomaly detection, allowing for the precise localization of temporal forged segments. To this end, we propose a novel context-aware perception layer that utilizes a heterogeneous activation operation and an adaptive context updater to construct a context-aware contrastive objective, which enhances the discriminability of forged instant features by contrasting them with genuine instant features in terms of their distances to the global context. An efficient context-aware contrastive coding is introduced to further push the limit of instant feature distinguishability between genuine and forged instants in a supervised sample-by-sample manner, suppressing the cross-sample influence to improve temporal forgery localization performance. Extensive experimental results over five public datasets demonstrate that our proposed UniCaCLF significantly outperforms the state-of-the-art competing algorithms.
Similar Papers
Weakly-supervised Audio Temporal Forgery Localization via Progressive Audio-language Co-learning Network
Sound
Finds fake audio parts without needing perfect labels.
TACTFL: Temporal Contrastive Training for Multi-modal Federated Learning with Similarity-guided Model Aggregation
Distributed, Parallel, and Cluster Computing
Helps computers learn from mixed, unlabeled data.
Weakly Supervised Multimodal Temporal Forgery Localization via Multitask Learning
CV and Pattern Recognition
Finds fake videos even with little clues.