Score: 2

VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding

Published: August 13, 2025 | arXiv ID: 2508.09641v1

By: Zhaowei Liu , Xin Guo , Haotian Xia and more

Potential Business Impact:

Helps computers understand money pictures and numbers.

Multimodal large language models (MLLMs) hold great promise for automating complex financial analysis. To comprehensively evaluate their capabilities, we introduce VisFinEval, the first large-scale Chinese benchmark that spans the full front-middle-back office lifecycle of financial tasks. VisFinEval comprises 15,848 annotated question-answer pairs drawn from eight common financial image modalities (e.g., K-line charts, financial statements, official seals), organized into three hierarchical scenario depths: Financial Knowledge & Data Analysis, Financial Analysis & Decision Support, and Financial Risk Control & Asset Optimization. We evaluate 21 state-of-the-art MLLMs in a zero-shot setting. The top model, Qwen-VL-max, achieves an overall accuracy of 76.3%, outperforming non-expert humans but trailing financial experts by over 14 percentage points. Our error analysis uncovers six recurring failure modes-including cross-modal misalignment, hallucinations, and lapses in business-process reasoning-that highlight critical avenues for future research. VisFinEval aims to accelerate the development of robust, domain-tailored MLLMs capable of seamlessly integrating textual and visual financial information. The data and the code are available at https://github.com/SUFE-AIFLM-Lab/VisFinEval.

CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

Computation and Language

Helps computers understand money charts and numbers.

16 Jun 2025 0

90%

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

CV and Pattern Recognition

Tests computers on money math with pictures.

6 Aug 2025 0

89%

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Computation and Language

Tests computers on global money news.

16 Jun 2025 3

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

51 pages

VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding

Helps computers understand money pictures and numbers.

Technical Abstract

CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation