Score: 2

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Published: December 6, 2025 | arXiv ID: 2512.06533v1

By: Ming Chen , Sheng Tang , Rong-Xi Tan and more

Potential Business Impact:

Teaches computers to guess numbers better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

Artificial Intelligence

Makes AI better at solving coding problems.

27 Aug 2025 1

88%

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

Artificial Intelligence

Makes computers better at solving coding problems.

27 Aug 2025 1

88%

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards

Computation and Language

Teaches computers to solve math problems better.

21 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

25 pages

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Teaches computers to guess numbers better.

Technical Abstract

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding

Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards