Score: 2

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

Published: November 12, 2025 | arXiv ID: 2511.09109v2

By: Wenda Wei , Yu-An Liu , Ruqing Zhang and more

BigTech Affiliations: Baidu

Potential Business Impact:

Helps AI think through problems step-by-step.

Business Areas:

Augmented Reality Hardware, Software

Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reasoning is from the answer and how well it addresses the question. To optimize reasoning under these bidirectional signals, we adopt a multi-objective reinforcement learning framework with a cascading reward structure that emphasizes early trajectory alignment. Empirical results on seven question answering benchmarks demonstrate that Bi-RAR surpasses previous methods and enables efficient interaction and reasoning with the search engine during training and inference.

MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval

Computation and Language

Lets computers find better answers from many sources.

31 Oct 2025 1

92%

Retrieval-augmented reasoning with lean language models

Computation and Language

Lets small computers answer hard questions accurately.

15 Aug 2025 2

92%

GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning

Computation and Language

Helps computers answer hard questions by planning steps.

23 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇳🇱 🇨🇳 China, Netherlands

Page Count

10 pages

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

Helps AI think through problems step-by-step.

Technical Abstract

MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval

Retrieval-augmented reasoning with lean language models

GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning