Score: 2

GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation

Published: October 13, 2025 | arXiv ID: 2510.11020v1

By: Shasha Guo , Liang Pang , Xi Wang and more

Potential Business Impact:

Helps computers solve hard geometry problems.

Business Areas:

Image Recognition Data and Analytics, Software

Auxiliary lines are essential for solving complex geometric problems but remain challenging for large vision-language models (LVLMs). Rather than editing diagrams to draw auxiliary lines, which current image editing models struggle to render with geometric precision, we generate textual descriptions of auxiliary-line constructions to better align with the representational strengths of LVLMs. To bridge the gap between textual descriptions and spatial structure, we propose a reinforcement learning framework that enhances diagram-text alignment. At the core of our approach is a cross-modal reward that evaluates how well the generated auxiliary-line description for an original diagram matches a ground-truth auxiliary-line diagram. Built on this reward, we present GeoVLMath, an open-source LVLM tailored to auxiliary-line reasoning in solid geometry. This fine-grained signal drives a GRPO-based RL stage, yielding precise diagram-text alignment. To support training, we develop a scalable data creation pipeline and construct AuxSolidMath, a dataset of 3,018 real-exam geometry problems with paired diagrams and aligned textual fields. At the 3B and 7B scales, GeoVLMath achieves competitive and often superior performance compared with strong open-source and proprietary LVLMs on auxiliary-line reasoning benchmarks.

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

Artificial Intelligence

Teaches computers to solve hard math drawings.

8 Aug 2025 1

90%

Generalizable Geometric Image Caption Synthesis

Artificial Intelligence

Teaches computers to solve geometry problems.

18 Sep 2025 0

89%

Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions

Artificial Intelligence

Helps computers solve hard geometry problems.

5 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com huggingface.co

Page Count

22 pages

GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation

Helps computers solve hard geometry problems.

Technical Abstract

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

Generalizable Geometric Image Caption Synthesis

Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions