Score: 1

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Published: May 29, 2025 | arXiv ID: 2505.23504v1

By: Liyun Zhu , Qixiang Chen , Xi Shen and more

Potential Business Impact:

Helps computers understand weird things happening in videos.

Business Areas:

Image Recognition Data and Analytics, Software

Video Anomaly Understanding (VAU) is essential for applications such as smart cities, security surveillance, and disaster alert systems, yet remains challenging due to its demand for fine-grained spatio-temporal perception and robust reasoning under ambiguity. Despite advances in anomaly detection, existing methods often lack interpretability and struggle to capture the causal and contextual aspects of abnormal events. This limitation is further compounded by the absence of comprehensive benchmarks for evaluating reasoning ability in anomaly scenarios. To address both challenges, we introduce VAU-R1, a data-efficient framework built upon Multimodal Large Language Models (MLLMs), which enhances anomaly reasoning through Reinforcement Fine-Tuning (RFT). Besides, we propose VAU-Bench, the first Chain-of-Thought benchmark tailored for video anomaly reasoning, featuring multiple-choice QA, detailed rationales, temporal annotations, and descriptive captions. Empirical results show that VAU-R1 significantly improves question answering accuracy, temporal grounding, and reasoning coherence across diverse contexts. Together, our method and benchmark establish a strong foundation for interpretable and reasoning-aware video anomaly understanding. Our code is available at https://github.com/GVCLab/VAU-R1.

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

CV and Pattern Recognition

Helps computers understand weird video moments.

6 Jan 2026 1

91%

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

CV and Pattern Recognition

Helps AI understand why weird things happen in videos.

26 May 2025 1

89%

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

CV and Pattern Recognition

Makes AI understand videos better, like a detective.

9 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Repos / Data Links

github.com

Page Count

23 pages

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Helps computers understand weird things happening in videos.

Technical Abstract

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning