Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward
By: Jianlong Chen , Daocheng Fu , Shengze Xu and more
Potential Business Impact:
Teaches computers to solve math problems better.
Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.
Similar Papers
Generalizable Geometric Image Caption Synthesis
Artificial Intelligence
Teaches computers to solve geometry problems.
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Machine Learning (CS)
Helps AI check its own thinking better.
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards
Computation and Language
Teaches computers to solve math problems better.