Score: 2

Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward

Published: January 8, 2026 | arXiv ID: 2601.05073v1

By: Jianlong Chen , Daocheng Fu , Shengze Xu and more

Potential Business Impact:

Teaches computers to solve math problems better.

Business Areas:

Gamification Gaming

Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.