Active Context Compression: Autonomous Memory Management in LLM Agents
By: Nikhil Verma
Large Language Model (LLM) agents struggle with long-horizon software engineering tasks due to "Context Bloat." As interaction history grows, computational costs explode, latency increases, and reasoning capabilities degrade due to distraction by irrelevant past errors. Existing solutions often rely on passive, external summarization mechanisms that the agent cannot control. This paper proposes Focus, an agent-centric architecture inspired by the biological exploration strategies of Physarum polycephalum (slime mold). The Focus Agent autonomously decides when to consolidate key learnings into a persistent "Knowledge" block and actively withdraws (prunes) the raw interaction history. Using an optimized scaffold matching industry best practices (persistent bash + string-replacement editor), we evaluated Focus on N=5 context-intensive instances from SWE-bench Lite using Claude Haiku 4.5. With aggressive prompting that encourages frequent compression, Focus achieves 22.7% token reduction (14.9M -> 11.5M tokens) while maintaining identical accuracy (3/5 = 60% for both agents). Focus performed 6.0 autonomous compressions per task on average, with token savings up to 57% on individual instances. We demonstrate that capable models can autonomously self-regulate their context when given appropriate tools and prompting, opening pathways for cost-aware agentic systems without sacrificing task performance.
Similar Papers
Adaptive Focus Memory for Language Models
Computation and Language
Keeps chatbots remembering important details cheaply.
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
Computation and Language
Helps robots find important web info faster, safer.
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Artificial Intelligence
Teaches computers to remember important things better.