QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models
By: Zhaolu Kang , Junhao Gong , Wenqing Hu and more
Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs' quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
Similar Papers
FinanceQA: A Benchmark for Evaluating Financial Analysis Capabilities of Large Language Models
Machine Learning (CS)
Tests if AI can do smart money jobs.
Market-Bench: Evaluating Large Language Models on Introductory Quantitative Trading and Market Dynamics
Computation and Language
Helps computers build trading tools from simple instructions.
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry
Artificial Intelligence
Tests if computers can do math for chemistry.