Context-Picker: Dynamic context selection using multi-stage reinforcement learning
By: Siyuan Zhu , Chengdong Xu , Kaiqiang Ke and more
In long-context question answering (LCQA), determining the optimal amount of context for a given query is a significant challenge. Including too few passages may omit critical information, while including too many can introduce noise and reduce the quality of the answer. Traditional approaches, such as fixed Top-$K$ retrieval and single-stage reranking, face the dilemma of selecting the right number of passages. This problem is particularly pronounced for factoid questions, which often require only a few specific pieces of evidence. To address this issue, we introduce \emph{Context-Picker}, a reasoning-aware framework that shifts the paradigm from similarity-based ranking to minimal sufficient subset selection. Context-Picker treats context selection as a decision-making process optimized via a human-inspired, two-stage reinforcement learning schedule: a \emph{recall-oriented} stage that prioritizes the coverage of reasoning chains, followed by a \emph{precision-oriented} stage that aggressively prunes redundancy to distill a compact evidence set. To resolve reward sparsity, we propose an offline evidence distillation pipeline that mines "minimal sufficient sets" via a Leave-One-Out (LOO) procedure, providing dense, task-aligned supervision. Experiments on five long-context and multi-hop QA benchmarks demonstrate that Context-Picker significantly outperforms strong RAG baselines, achieving superior answer accuracy with comparable or reduced context lengths. Ablation studies indicate that the coarse-to-fine optimization schedule, the redundancy-aware reward shaping, and the rationale-guided format all contribute substantially to these gains.
Similar Papers
Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$
Computation and Language
Finds the best info for answers, saving computer power.
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Computation and Language
Helps computers understand long stories to answer questions.
Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
Computation and Language
Helps chatbots find the best answers in conversations.