FMVP: Masked Flow Matching for Adversarial Video Purification
By: Duoxun Tang , Xueyi Zhang , Chak Hin Wang and more
Potential Business Impact:
Fixes fake videos made to trick computers.
Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regressing clean videos from adversarial inputs often fails to recover faithful content due to the subtle nature of perturbations; this necessitates physically shattering the adversarial structure. Therefore, we propose Flow Matching for Adversarial Video Purification FMVP. FMVP physically shatters global adversarial structures via a masking strategy and reconstructs clean video dynamics using Conditional Flow Matching (CFM) with an inpainting objective. To further decouple semantic content from adversarial noise, we design a Frequency-Gated Loss (FGL) that explicitly suppresses high-frequency adversarial residuals while preserving low-frequency fidelity. We design Attack-Aware and Generalist training paradigms to handle known and unknown threats, respectively. Extensive experiments on UCF-101 and HMDB-51 demonstrate that FMVP outperforms state-of-the-art methods (DiffPure, Defense Patterns (DP), Temporal Shuffling (TS) and FlowPure), achieving robust accuracy exceeding 87% against PGD and 89% against CW attacks. Furthermore, FMVP demonstrates superior robustness against adaptive attacks (DiffHammer) and functions as a zero-shot adversarial detector, attaining detection accuracies of 98% for PGD and 79% for highly imperceptible CW attacks.
Similar Papers
FlowPure: Continuous Normalizing Flows for Adversarial Purification
Machine Learning (CS)
Cleans up bad computer guesses to make them right.
From Diffusion to One-Step Generation: A Comparative Study of Flow-Based Models with Application to Image Inpainting
CV and Pattern Recognition
Makes pictures from scratch in one step.
Flowception: Temporally Expansive Flow Matching for Video Generation
CV and Pattern Recognition
Makes videos by adding and fixing pictures.