Score: 0

VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

Published: July 29, 2025 | arXiv ID: 2507.21507v1

By: Shibo Gao , Peipei Yang , Yangyang Liu and more

Potential Business Impact:

Finds weird things in videos, explains them.

Business Areas:

Image Recognition Data and Analytics, Software

Video Anomaly Detection (VAD) aims to identify anomalous events in videos and accurately determine their time intervals. Current VAD methods mainly fall into two categories: traditional DNN-based approaches that focus on temporal localization, and LLM-based approaches that emphasize semantic understanding. Both anomaly understanding and grounding are essential for comprehensive video anomaly detection and can complement each other. However, no existing model or dataset supports both tasks simultaneously. To address this, we introduce VAGU (Video Anomaly Grounding and Understanding), the first benchmark to integrate both tasks. Each VAGU instance includes annotations for anomaly category, semantic explanation, precise temporal grounding and Video QA. We also provide multiple-choice Video QA for objective evaluation. Based on this dataset, we propose Glance then Scrutinize (GtS), a training-free framework guided by textual prompts. The framework first enables coarse localization of high-probability anomalous regions, followed by detailed anomaly interpretation and temporal boundary refinement. Additionally, we propose the JeAUG metric, which jointly evaluates semantic interpretability and temporal precision, overcoming the limitations of traditional metrics. Extensive experiments verify the effectiveness of our benchmark, framework, and evaluation metric.

GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection

CV and Pattern Recognition

Spots strange events in videos automatically.

1 Aug 2025 2

90%

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

CV and Pattern Recognition

Finds exact moments in videos from descriptions.

19 Oct 2025 2

90%

DUAL-VAD: Dual Benchmarks and Anomaly-Focused Sampling for Video Anomaly Detection

CV and Pattern Recognition

Finds weird things happening in videos.

15 Sep 2025 0

View PDF Login to Bookmark

Page Count

21 pages

VAGU & GtS: LLM-Based Benchmark and Framework for Joint Video Anomaly Grounding and Understanding

Finds weird things in videos, explains them.

Technical Abstract

GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

DUAL-VAD: Dual Benchmarks and Anomaly-Focused Sampling for Video Anomaly Detection