Score: 1

Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing

Published: November 18, 2025 | arXiv ID: 2511.14157v1

By: Xun Lin , Shuai Wang , Yi Yu and more

Potential Business Impact:

Keeps fake faces from fooling face scanners.

Business Areas:

Image Recognition Data and Analytics, Software

Multimodal Face Anti-Spoofing (FAS) methods, which integrate multiple visual modalities, often suffer even more severe performance degradation than unimodal FAS when deployed in unseen domains. This is mainly due to two overlooked risks that affect cross-domain multimodal generalization. The first is the modal representation invariant risk, i.e., whether representations remain generalizable under domain shift. We theoretically show that the inherent class asymmetry in FAS (diverse spoofs vs. compact reals) enlarges the upper bound of generalization error, and this effect is further amplified in multimodal settings. The second is the modal synergy invariant risk, where models overfit to domain-specific inter-modal correlations. Such spurious synergy cannot generalize to unseen attacks in target domains, leading to performance drops. To solve these issues, we propose a provable framework, namely Multimodal Representation and Synergy Invariance Learning (RiSe). For representation risk, RiSe introduces Asymmetric Invariant Risk Minimization (AsyIRM), which learns an invariant spherical decision boundary in radial space to fit asymmetric distributions, while preserving domain cues in angular space. For synergy risk, RiSe employs Multimodal Synergy Disentanglement (MMSD), a self-supervised task enhancing intrinsic, generalizable modal features via cross-sample mixing and disentanglement. Theoretical analysis and experiments verify RiSe, which achieves state-of-the-art cross-domain performance.

Representation Space Constrained Learning with Modality Decoupling for Multimodal Object Detection

CV and Pattern Recognition

Helps computers see better using different senses.

19 Nov 2025 2

89%

Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

CV and Pattern Recognition

Stops fake faces from tricking security cameras.

14 May 2025 1

88%

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing

CV and Pattern Recognition

Keeps fake faces from fooling security cameras.

1 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

22 pages

Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing

Keeps fake faces from fooling face scanners.

Technical Abstract

Representation Space Constrained Learning with Modality Decoupling for Multimodal Object Detection

Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing