Score: 0

QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

Published: January 13, 2026 | arXiv ID: 2601.08689v1

By: Zhaolu Kang , Junhao Gong , Wenqing Hu and more

Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.

FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models

Machine Learning (CS)

Tests if AI can do smart money jobs.

30 Jan 2025 1

90%

Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics

Computation and Language

Helps computers build trading tools from simple instructions.

13 Dec 2025 1

90%

QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry

Artificial Intelligence

Tests if computers can do math for chemistry.

3 Aug 2025 0

View PDF Login to Bookmark

QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

Technical Abstract

FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models

Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics

QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry