GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
By: Yue Liu , Shengfang Zhai , Mingzhe Du and more
Potential Business Impact:
Makes AI safer by teaching it to think first.
To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason before making moderation decisions via online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with 123K samples and 631K reasoning steps, spanning text, image, and text-image inputs. Then, based on it, we cold-start our model's reasoning ability via SFT. In addition, we further enhance reasoning regarding moderation through online RL. Concretely, to enhance diversity and difficulty of samples, we conduct rejection sampling followed by data augmentation via the proposed safety-aware data concatenation. Besides, we use a dynamic clipping parameter to encourage exploration in early stages and exploitation in later stages. To balance performance and token efficiency, we design a length-aware safety reward that integrates accuracy, format, and token cost. Extensive experiments demonstrate the superiority of our model. Remarkably, it surpasses the runner-up by 19.27% F1 score on average. We release data, code, and models (3B/7B) of GuardReasoner-VL at https://github.com/yueliu1999/GuardReasoner-VL/
Similar Papers
GuardReasoner: Towards Reasoning-based LLM Safeguards
Cryptography and Security
Teaches AI to think better to avoid mistakes.
GuardReasoner: Towards Reasoning-based LLM Safeguards
Cryptography and Security
Teaches AI to think better to avoid mistakes.
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
Machine Learning (CS)
Makes AI safer by understanding pictures and words.