Simple Radiology VLLM Test-time Scaling with Thought Graph Traversal
By: Yue Yao , Zelin Wen , Yan Tong and more
Potential Business Impact:
Helps AI write better medical reports.
Test-time scaling offers a promising way to improve the reasoning performance of vision-language large models (VLLMs) without additional training. In this paper, we explore a simple but effective approach for applying test-time scaling to radiology report generation. Specifically, we introduce a lightweight Thought Graph Traversal (TGT) framework that guides the model to reason through organ-specific findings in a medically coherent order. This framework integrates structured medical priors into the prompt, enabling deeper and more logical analysis with no changes to the underlying model. To further enhance reasoning depth, we apply a reasoning budget forcing strategy that adjusts the model's inference depth at test time by dynamically extending its generation process. This simple yet powerful combination allows a frozen radiology VLLM to self-correct and generate more accurate, consistent chest X-ray reports. Our method outperforms baseline prompting approaches on standard benchmarks, and also reveals dataset biases through traceable reasoning paths. Code and prompts are open-sourced for reproducibility at https://github.com/glerium/Thought-Graph-Traversal.
Similar Papers
Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning
CV and Pattern Recognition
Helps doctors diagnose illnesses from medical pictures.
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
Computation and Language
Improves AI's medical image understanding.
Limits and Gains of Test-Time Scaling in Vision-Language Reasoning
Machine Learning (CS)
Makes AI better at understanding pictures and words.