Score: 1

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Published: December 14, 2025 | arXiv ID: 2512.12576v1

By: Xueru Wen , Jie Lou , Yanjiang Liu and more

Potential Business Impact:

Makes AI think better to solve problems.

Business Areas:

A/B Testing Data and Analytics

While reinforcement learning have achieved impressive progress in language model reasoning, they are constrained by the requirement for verifiable rewards. Recent verifier-free RL methods address this limitation by utilizing the intrinsic probabilities of LLMs generating reference answers as reward signals. However, these approaches typically sample reasoning traces conditioned only on the question. This design decouples reasoning-trace sampling from answer information, leading to inefficient exploration and incoherence between traces and final answers. In this paper, we propose \textit{\b{Co}upled \b{V}ariational \b{R}einforcement \b{L}earning} (CoVRL), which bridges variational inference and reinforcement learning by coupling prior and posterior distributions through a hybrid sampling strategy. By constructing and optimizing a composite distribution that integrates these two distributions, CoVRL enables efficient exploration while preserving strong thought-answer coherence. Extensive experiments on mathematical and general reasoning benchmarks show that CoVRL improves performance by 12.4\% over the base model and achieves an additional 2.3\% improvement over strong state-of-the-art verifier-free RL baselines, providing a principled framework for enhancing the general reasoning capabilities of language models.

Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement

Machine Learning (CS)

Teaches computers to think better by comparing answers.

1 Aug 2025 2

91%

RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models

Artificial Intelligence

Helps computers learn to solve harder problems.

29 Oct 2025 0

91%

COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning

Robotics

Helps self-driving cars learn better and safer.

10 Dec 2025 0

View PDF Login to Bookmark

Page Count

15 pages

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Makes AI think better to solve problems.

Technical Abstract

Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement

RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models

COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning