Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models
By: He Huang , Zixuan Hu , Dongxiao Li and more
Potential Business Impact:
Finds weird things in videos faster.
Video anomaly detection (VAD) plays a vital role in real-world applications such as security surveillance, autonomous driving, and industrial monitoring. Recent advances in large pre-trained models have opened new opportunities for training-free VAD by leveraging rich prior knowledge and general reasoning capabilities. However, existing studies typically rely on dense frame-level inference, incurring high computational costs and latency. This raises a fundamental question: Is dense reasoning truly necessary when using powerful pre-trained models in VAD systems? To answer this, we propose ReCoVAD, a novel framework inspired by the dual reflex and conscious pathways of the human nervous system, enabling selective frame processing to reduce redundant computation. ReCoVAD consists of two core pathways: (i) a Reflex pathway that uses a lightweight CLIP-based module to fuse visual features with prototype prompts and produce decision vectors, which query a dynamic memory of past frames and anomaly scores for fast response; and (ii) a Conscious pathway that employs a medium-scale vision-language model to generate textual event descriptions and refined anomaly scores for novel frames. It continuously updates the memory and prototype prompts, while an integrated large language model periodically reviews accumulated descriptions to identify unseen anomalies, correct errors, and refine prototypes. Extensive experiments show that ReCoVAD achieves state-of-the-art training-free performance while processing only 28.55\% and 16.04\% of the frames used by previous methods on the UCF-Crime and XD-Violence datasets, demonstrating that sparse reasoning is sufficient for effective large-model-based VAD.
Similar Papers
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
CV and Pattern Recognition
Helps AI understand why weird things happen in videos.
RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
CV and Pattern Recognition
Finds weird things in videos by watching motion and meaning.
DUAL-VAD: Dual Benchmarks and Anomaly-Focused Sampling for Video Anomaly Detection
CV and Pattern Recognition
Finds weird things happening in videos.