Chain-of-Anomaly Thoughts with Large Vision-Language Models
By: Pedro Domingos , João Pereira , Vasco Lopes and more
Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategies show significant potential for improving performance in language tasks, the lack of inductive anomaly biases in their reasoning further steers the models towards normal interpretations. To address this, we propose Chain-of-Anomaly-Thoughts (CoAT), a multi-agent reasoning framework that introduces inductive criminal bias in the reasoning process through a final, anomaly-focused classification layer. Our method significantly improves Anomaly Detection, boosting F1-score by 11.8 p.p. on challenging low-resolution footage and Anomaly Classification by 3.78 p.p. in high-resolution videos.
Similar Papers
Rethinking Chain-of-Thought Reasoning for Videos
CV and Pattern Recognition
Makes AI understand videos faster with less data.
CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Artificial Intelligence
Helps traffic lights fix jams automatically.
Latent Chain-of-Thought for Visual Reasoning
Artificial Intelligence
Makes AI think step-by-step better for new problems.