Score: 1

See or Say Graphs: Agent-Driven Scalable Graph Understanding with Vision-Language Models

Published: October 19, 2025 | arXiv ID: 2510.16769v1

By: Shuo Han , Yukun Cao , Zezhong Ding and more

Potential Business Impact:

Lets computers understand complex pictures and text together.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Vision-language models (VLMs) have shown promise in graph understanding, but remain limited by input-token constraints, facing scalability bottlenecks and lacking effective mechanisms to coordinate textual and visual modalities. To address these challenges, we propose GraphVista, a unified framework that enhances both scalability and modality coordination in graph understanding. For scalability, GraphVista organizes graph information hierarchically into a lightweight GraphRAG base, which retrieves only task-relevant textual descriptions and high-resolution visual subgraphs, compressing redundant context while preserving key reasoning elements. For modality coordination, GraphVista introduces a planning agent that routes tasks to the most suitable modality-using the text modality for simple property reasoning and the visual modality for local and structurally complex reasoning grounded in explicit topology. Extensive experiments demonstrate that GraphVista scales to large graphs, up to $200\times$ larger than those used in existing benchmarks, and consistently outperforms existing textual, visual, and fusion-based methods, achieving up to $4.4\times$ quality improvement over the state-of-the-art baselines by fully exploiting the complementary strengths of both modalities.

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

Artificial Intelligence

Teaches computers to "think" with pictures and tools.

24 Nov 2025 2

89%

A Vision-Language Agent System for Compositional Reasoning with VLM-assisted Script and Executable Generation

CV and Pattern Recognition

Helps computers understand complex picture-word puzzles.

9 Jun 2025 1

89%

Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities

Computation and Language

Computers discover science by checking their own work.

18 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

19 pages

See or Say Graphs: Agent-Driven Scalable Graph Understanding with Vision-Language Models

Lets computers understand complex pictures and text together.

Technical Abstract

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

A Vision-Language Agent System for Compositional Reasoning with VLM-assisted Script and Executable Generation

Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities