Score: 1

Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

Published: May 23, 2025 | arXiv ID: 2505.18319v1

By: Sifan Wu , Huan Zhang , Yizhan Li and more

Potential Business Impact:

Helps computers understand pictures of materials.

Business Areas:

Advanced Materials Manufacturing, Science and Engineering

The emergence of Multimodal Large Language Models (MLLMs) that integrate vision and language modalities has unlocked new potentials for scientific reasoning, outperforming prior benchmarks in both natural language and coding domains. Current materials science evaluation datasets such as MaScQA and SciQA remain largely text-based and fail to capture the visual and research-level analytic complexity required in materials discovery and design. We introduce MatVQA, a scalable benchmark specifically designed to address this gap. Generated via an automated pipeline, MArxivAgent, from recent materials literature, MatVQA features 1325 questions across four critical structure-property-performance (SPP) reasoning tasks. Uniquely, MatVQA employs an iterative process to eliminate textual shortcuts, compelling MLLMs to perform fine-grained, low-level visual analysis of material imagery (e.g., microscopy, diffraction patterns) integrated with multi-step scientific reasoning. Benchmarking 17 open- and closed-source MLLMs on MatVQA reveals substantial gaps in current multimodal reasoning capabilities. MatVQA benchmark data, along with evaluation code, is publicly available in \href{https://anonymous.4open.science/r/matvqa-1E01}{https://anonymous.4open.science/r/matvqa-1E01/README.md} to catalyze further research in applying MLLMs to complex materials science problems.

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

CV and Pattern Recognition

Helps AI understand science pictures for new discoveries.

17 Mar 2025 2

91%

CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs

CV and Pattern Recognition

Tests if computers can do science like people.

30 May 2025 2

91%

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling

CV and Pattern Recognition

Helps computers understand science pictures and text.

8 Jul 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Repos / Data Links

github.com

Page Count

17 pages

Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

Helps computers understand pictures of materials.

Technical Abstract

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling