InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
By: Kun-Hsiang Lin , Yu-Wen Tseng , Kang-Yang Huang and more
Potential Business Impact:
Stops fake faces from fooling cameras.
Face anti-spoofing (FAS) aims to construct a robust system that can withstand diverse attacks. While recent efforts have concentrated mainly on cross-domain generalization, two significant challenges persist: limited semantic understanding of attack types and training redundancy across domains. We address the first by integrating vision-language models (VLMs) to enhance the perception of visual input. For the second challenge, we employ a meta-domain strategy to learn a unified model that generalizes well across multiple domains. Our proposed InstructFLIP is a novel instruction-tuned framework that leverages VLMs to enhance generalization via textual guidance trained solely on a single domain. At its core, InstructFLIP explicitly decouples instructions into content and style components, where content-based instructions focus on the essential semantics of spoofing, and style-based instructions consider variations related to the environment and camera characteristics. Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Project website is available at https://kunkunlin1221.github.io/InstructFLIP.
Similar Papers
SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining
CV and Pattern Recognition
Stops fake faces from tricking cameras.
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
CV and Pattern Recognition
Helps computers tell real faces from fake ones.
Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection
CV and Pattern Recognition
Stops fake faces fooling security cameras.