Score: 0

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Published: January 7, 2026 | arXiv ID: 2601.03948v1

By: Rui Sun , Yifan Sun , Sheng Xu and more

Potential Business Impact:

Helps AI make better money choices in messy markets.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning (RL) has enabled Large Language Models (LLMs) to achieve remarkable reasoning in domains like mathematics and coding, where verifiable rewards provide clear signals. However, extending this paradigm to financial decision is challenged by the market's stochastic nature: rewards are verifiable but inherently noisy, causing standard RL to degenerate into reward hacking. To address this, we propose Trade-R1, a model training framework that bridges verifiable rewards to stochastic environments via process-level reasoning verification. Our key innovation is a verification method that transforms the problem of evaluating reasoning over lengthy financial documents into a structured Retrieval-Augmented Generation (RAG) task. We construct a triangular consistency metric, assessing pairwise alignment between retrieved evidence, reasoning chains, and decisions to serve as a validity filter for noisy market returns. We explore two reward integration strategies: Fixed-effect Semantic Reward (FSR) for stable alignment signals, and Dynamic-effect Semantic Reward (DSR) for coupled magnitude optimization. Experiments on different country asset selection demonstrate that our paradigm reduces reward hacking, with DSR achieving superior cross-market generalization while maintaining the highest reasoning consistency.

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Artificial Intelligence

Helps AI make smarter money choices in messy markets.

7 Jan 2026 0

91%

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards

Computation and Language

Teaches computers to solve math problems better.

21 Nov 2025 0

91%

Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning

Trading & Market Microstructure

Helps computers make smart, safe money trades.

14 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Helps AI make better money choices in messy markets.

Technical Abstract

Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards

Trading-R1: Financial Trading with LLM Reasoning via Reinforcement Learning