CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning
By: Xuanzhang Liu , Jianglun Feng , Zhuoran Zhuang and more
Potential Business Impact:
Helps computers remember more to solve harder problems.
Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by "Context Explosion", where the accumulation of long text outputs overwhelms the model's context window and leads to reasoning failures. To address this, we introduce CoDA, a Context-Decoupled hierarchical Agent, a simple but effective reinforcement learning framework that decouples high-level planning from low-level execution. It employs a single, shared LLM backbone that learns to operate in two distinct, contextually isolated roles: a high-level Planner that decomposes tasks within a concise strategic context, and a low-level Executor that handles tool interactions in an ephemeral, isolated workspace. We train this unified agent end-to-end using PECO (Planner-Executor Co-Optimization), a reinforcement learning methodology that applies a trajectory-level reward to jointly optimize both roles, fostering seamless collaboration through context-dependent policy updates. Extensive experiments demonstrate that CoDA achieves significant performance improvements over state-of-the-art baselines on complex multi-hop question-answering benchmarks, and it exhibits strong robustness in long-context scenarios, maintaining stable performance while all other baselines suffer severe degradation, thus further validating the effectiveness of our hierarchical design in mitigating context overload.
Similar Papers
CoDA: Agentic Systems for Collaborative Data Visualization
Artificial Intelligence
Helps computers make charts from your words.
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem
Machine Learning (CS)
Helps robots learn to work together better.
CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation
Computation and Language
AI tutors now teach better, not just give answers.