Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs
By: Wanyang Hong , Zhaoning Zhang , Yi Chen and more
Potential Business Impact:
Keeps chatbots remembering what you said.
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.
Similar Papers
HEMA : A Hippocampus-Inspired Extended Memory Architecture for Long-Context AI Conversations
Computation and Language
Lets computers remember long talks perfectly.
Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration
Multiagent Systems
Lets computers remember answers to save time.
Evaluating Long-Term Memory for Long-Context Question Answering
Computation and Language
Helps computers remember conversations better.