Score: 1

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

Published: September 22, 2025 | arXiv ID: 2509.17437v1

By: Guizhen Chen , Weiwen Xu , Hao Zhang and more

Potential Business Impact:

Helps AI understand pictures for better problem-solving.

Business Areas:

Geospatial Data and Analytics, Navigation and Mapping

Recent advancements in reinforcement learning (RL) have enhanced the reasoning abilities of large language models (LLMs), yet the impact on multimodal LLMs (MLLMs) is limited. Particularly in vision-intensive tasks like geometric reasoning, MLLMs hallucinate frequently, leading to inaccurate reasoning. We attribute this to the perceptual bottleneck in MLLMs, which caps the benefits of reasoning training. To quantify this, we design a Geo-Perception Question-Answering (GeoPQA) benchmark, targeting basic geometric concepts and spatial relationships. Experiments on GeoPQA reveal significant shortcomings of MLLMs in visual perception, which constrain RL reward signals for effective training. To address this bottleneck, we propose a two-stage RL training framework by first enhancing the visual perception of geometric structures, then fostering reasoning capabilities. Applied to Qwen2.5-VL-3B-Instruct, our two-stage training improves geometric reasoning by 9.7% and geometric problem solving by 9.1%, compared to the direct reasoning training approach. Our method also generalizes to other vision-intensive domains like figure understanding, highlighting the importance of perceptual grounding in effective MLLM reasoning.

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

CV and Pattern Recognition

Helps computers understand 3D shapes and where things are.

21 Nov 2025 1

91%

SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization

CV and Pattern Recognition

Teaches computers to understand where things are.

2 Jun 2025 0

91%

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

Machine Learning (CS)

Teaches computers to see and think better.

8 Jun 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

9 pages

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

Helps AI understand pictures for better problem-solving.

Technical Abstract

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward