Score: 2

XFinBench: Benchmarking LLMs in Complex Financial Problem Solving and Reasoning

Published: August 20, 2025 | arXiv ID: 2508.15861v1

By: Zhihan Zhang, Yixin Cao, Lizi Liao

Potential Business Impact:

Tests computers on hard money problems.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Solving financial problems demands complex reasoning, multimodal data processing, and a broad technical understanding, presenting unique challenges for current large language models (LLMs). We introduce XFinBench, a novel benchmark with 4,235 examples designed to evaluate LLM's ability in solving complex, knowledge-intensive financial problems across diverse graduate-level finance topics with multi-modal context. We identify five core capabilities of LLMs using XFinBench, i.e, terminology understanding, temporal reasoning, future forecasting, scenario planning, and numerical modelling. Upon XFinBench, we conduct extensive experiments on 18 leading models. The result shows that o1 is the best-performing text-only model with an overall accuracy of 67.3%, but still lags significantly behind human experts with 12.5%, especially in temporal reasoning and scenario planning capabilities. We further construct a knowledge bank with 3,032 finance terms for knowledge augmentation analysis, and find that relevant knowledge to the question only brings consistent accuracy improvements to small open-source model. Additionally, our error analysis reveals that rounding errors during calculation and blindness to position and intersection of curves in the image are two primary issues leading to model's poor performance in calculating and visual-context questions, respectively. Code and dataset are accessible via GitHub: https://github.com/Zhihan72/XFinBench.

FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs

Artificial Intelligence

Tests how smart computers are with money.

18 May 2025 2

90%

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

CV and Pattern Recognition

Tests computers on money math with pictures.

6 Aug 2025 0

90%

FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging

Computation and Language

Teaches computers to solve tricky money math problems.

6 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇸🇬 Singapore, China

Repos / Data Links

github.com

Page Count

44 pages

XFinBench: Benchmarking LLMs in Complex Financial Problem Solving and Reasoning

Tests computers on hard money problems.

Technical Abstract

FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging