Score: 0

Context-Picker: Dynamic context selection using multi-stage reinforcement learning

Published: December 16, 2025 | arXiv ID: 2512.14465v1

By: Siyuan Zhu , Chengdong Xu , Kaiqiang Ke and more

In long-context question answering (LCQA), determining the optimal amount of context for a given query is a significant challenge. Including too few passages may omit critical information, while including too many can introduce noise and reduce the quality of the answer. Traditional approaches, such as fixed Top-$K$ retrieval and single-stage reranking, face the dilemma of selecting the right number of passages. This problem is particularly pronounced for factoid questions, which often require only a few specific pieces of evidence. To address this issue, we introduce \emph{Context-Picker}, a reasoning-aware framework that shifts the paradigm from similarity-based ranking to minimal sufficient subset selection. Context-Picker treats context selection as a decision-making process optimized via a human-inspired, two-stage reinforcement learning schedule: a \emph{recall-oriented} stage that prioritizes the coverage of reasoning chains, followed by a \emph{precision-oriented} stage that aggressively prunes redundancy to distill a compact evidence set. To resolve reward sparsity, we propose an offline evidence distillation pipeline that mines "minimal sufficient sets" via a Leave-One-Out (LOO) procedure, providing dense, task-aligned supervision. Experiments on five long-context and multi-hop QA benchmarks demonstrate that Context-Picker significantly outperforms strong RAG baselines, achieving superior answer accuracy with comparable or reduced context lengths. Ablation studies indicate that the coarse-to-fine optimization schedule, the redundancy-aware reward shaping, and the rationale-guided format all contribute substantially to these gains.

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$

Computation and Language

Finds the best info for answers, saving computer power.

10 Jun 2025 1

87%

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Computation and Language

Helps computers understand long stories to answer questions.

22 Oct 2025 1

87%

Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems

Computation and Language

Helps chatbots find the best answers in conversations.

26 Sep 2025 0

View PDF Login to Bookmark

Context-Picker: Dynamic context selection using multi-stage reinforcement learning

Technical Abstract

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$

LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems