TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models
By: Shima Imani , Seungwhan Moon , Lambert Mathias and more
Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from unreliable reasoning paths, supporting effective filtering, debugging, and model refinement.
Similar Papers
ReTrace: Interactive Visualizations for Reasoning Traces of Large Reasoning Models
Human-Computer Interaction
Shows how AI thinks, making it easier to understand.
TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models
Robotics
Helps robots predict tricky moves with less information.
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
CV and Pattern Recognition
Shows how computers "see" to solve problems.