Score: 0

Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning

Published: August 1, 2025 | arXiv ID: 2508.00356v1

By: Angelos Vlachos , Giorgos Filandrianos , Maria Lymperaiou and more

Potential Business Impact:

Enables AI to reason over multiple images

We present a Collaborative Agent-Based Framework for Multi-Image Reasoning. Our approach tackles the challenge of interleaved multimodal reasoning across diverse datasets and task formats by employing a dual-agent system: a language-based PromptEngineer, which generates context-aware, task-specific prompts, and a VisionReasoner, a large vision-language model (LVLM) responsible for final inference. The framework is fully automated, modular, and training-free, enabling generalization across classification, question answering, and free-form generation tasks involving one or multiple input images. We evaluate our method on 18 diverse datasets from the 2025 MIRAGE Challenge (Track A), covering a broad spectrum of visual reasoning tasks including document QA, visual comparison, dialogue-based understanding, and scene-level inference. Our results demonstrate that LVLMs can effectively reason over multiple images when guided by informative prompts. Notably, Claude 3.7 achieves near-ceiling performance on challenging tasks such as TQA (99.13% accuracy), DocVQA (96.87%), and MMCoQA (75.28 ROUGE-L). We also explore how design choices-such as model selection, shot count, and input length-influence the reasoning performance of different LVLMs.

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward

CV and Pattern Recognition

Fixes AI seeing things that aren't there.

23 Oct 2025 1

91%

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding

CV and Pattern Recognition

Helps cameras see road dangers and warn drivers.

24 Aug 2025 0

91%

Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning

CV and Pattern Recognition

Helps self-driving cars understand roads better.

28 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇷 Greece

Page Count

10 pages

Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning

Enables AI to reason over multiple images

Technical Abstract

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding

Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning