AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization
By: Jingyi Liao , Yongyi Su , Rong-Cheng Tu and more
Potential Business Impact:
Finds hidden flaws in things better.
While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities across diverse domains, their application to specialized anomaly detection (AD) remains constrained by domain adaptation challenges. Existing Group Relative Policy Optimization (GRPO) based approaches suffer from two critical limitations: inadequate training data utilization when models produce uniform responses, and insufficient supervision over reasoning processes that encourage immediate binary decisions without deliberative analysis. We propose a comprehensive framework addressing these limitations through two synergistic innovations. First, we introduce a multi-stage deliberative reasoning process that guides models from region identification to focused examination, generating diverse response patterns essential for GRPO optimization while enabling structured supervision over analytical workflows. Second, we develop a fine-grained reward mechanism incorporating classification accuracy and localization supervision, transforming binary feedback into continuous signals that distinguish genuine analytical insight from spurious correctness. Comprehensive evaluation across multiple industrial datasets demonstrates substantial performance improvements in adapting general vision-language models to specialized anomaly detection. Our method achieves superior accuracy with efficient adaptation of existing annotations, effectively bridging the gap between general-purpose MLLM capabilities and the fine-grained visual discrimination required for detecting subtle manufacturing defects and structural irregularities.
Similar Papers
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
CV and Pattern Recognition
Finds hidden flaws in factory pictures automatically.
LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection
CV and Pattern Recognition
Finds weird patterns by thinking like a detective.
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
Cryptography and Security
Finds computer bugs better by teaching AI.