Score: 0

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Published: December 5, 2025 | arXiv ID: 2512.05943v1

By: Shima Imani , Seungwhan Moon , Lambert Mathias and more

Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from unreliable reasoning paths, supporting effective filtering, debugging, and model refinement.

ReTrace: Interactive Visualizations for Reasoning Traces of Large Reasoning Models

Human-Computer Interaction

Shows how AI thinks, making it easier to understand.

14 Nov 2025 0

90%

TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models

Robotics

Helps robots predict tricky moves with less information.

2 Mar 2025 0

89%

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

CV and Pattern Recognition

Shows how computers "see" to solve problems.

4 Dec 2025 1

View PDF Login to Bookmark

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Technical Abstract

ReTrace: Interactive Visualizations for Reasoning Traces of Large Reasoning Models

TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark