Multi View Slot Attention Using Paraphrased Texts For Face Anti-Spoofing
By: Jeongmin Yu , Susang Kim , Kisu Lee and more
Potential Business Impact:
Stops fake faces fooling cameras better.
Recent face anti-spoofing (FAS) methods have shown remarkable cross-domain performance by employing vision-language models like CLIP. However, existing CLIP-based FAS models do not fully exploit CLIP's patch embedding tokens, failing to detect critical spoofing clues. Moreover, these models rely on a single text prompt per class (e.g., 'live' or 'fake'), which limits generalization. To address these issues, we propose MVP-FAS, a novel framework incorporating two key modules: Multi-View Slot attention (MVS) and Multi-Text Patch Alignment (MTPA). Both modules utilize multiple paraphrased texts to generate generalized features and reduce dependence on domain-specific text. MVS extracts local detailed spatial features and global context from patch embeddings by leveraging diverse texts with multiple perspectives. MTPA aligns patches with multiple text representations to improve semantic robustness. Extensive experiments demonstrate that MVP-FAS achieves superior generalization performance, outperforming previous state-of-the-art methods on cross-domain datasets. Code: https://github.com/Elune001/MVP-FAS.
Similar Papers
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
CV and Pattern Recognition
Spots fake faces better by looking closely.
FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models
CV and Pattern Recognition
Keeps face scanners safe from fake faces.
SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining
CV and Pattern Recognition
Stops fake faces from tricking cameras.