Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection
By: Satoshi Hashimoto , Hitoshi Nishimura , Yanan Wang and more
Potential Business Impact:
Teaches computers to spot trouble without real examples.
Deploying video anomaly detection in practice is hampered by the scarcity and collection cost of real abnormal footage. We address this by training without any real abnormal videos while evaluating under the standard weakly supervised split, and we introduce PA-VAD, a generation-driven approach that learns a detector from synthesized pseudo-abnormal videos paired with real normal videos, using only a small set of real normal images to drive synthesis. For synthesis, we select class-relevant initial images with CLIP and refine textual prompts with a vision-language model to improve fidelity and scene consistency before invoking a video diffusion model. For training, we mitigate excessive spatiotemporal magnitude in synthesized anomalies by an domain-aligned regularized module that combines domain alignment and memory usage-aware updates. Extensive experiments show that our approach reaches 98.2% on ShanghaiTech and 82.5% on UCF-Crime, surpassing the strongest real-abnormal method on ShanghaiTech by +0.6% and outperforming the UVAD state-of-the-art on UCF-Crime by +1.9%. The results demonstrate that high-accuracy anomaly detection can be obtained without collecting real anomalies, providing a practical path toward scalable deployment.
Similar Papers
GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection
CV and Pattern Recognition
Spots strange events in videos automatically.
Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment
CV and Pattern Recognition
Finds unusual events in videos better.
RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
CV and Pattern Recognition
Finds weird things in videos by watching motion and meaning.