Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
By: Yuanzhe Hu, Yu Wang, Julian McAuley
Potential Business Impact:
Helps AI remember and use information better.
Recent benchmarks for Large Language Model (LLM) agents primarily focus on evaluating reasoning, planning, and execution capabilities, while another critical component-memory, encompassing how agents memorize, update, and retrieve long-term information-is under-evaluated due to the lack of benchmarks. We term agents with memory mechanisms as memory agents. In this paper, we identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and conflict resolution. Existing datasets either rely on limited context lengths or are tailored for static, long-context settings like book-based QA, which do not reflect the interactive, multi-turn nature of memory agents that incrementally accumulate information. Furthermore, no existing benchmarks cover all four competencies. Therefore, we introduce MemoryAgentBench, a new benchmark specifically designed for memory agents. Our benchmark combines reformulated existing datasets with newly constructed ones, covering the above four memory competencies, providing a systematic and challenging testbed for assessing memory quality. We evaluate a diverse set of memory agents, ranging from simple context-based and retrieval-augmented generation (RAG) systems to advanced agents with external memory modules and tool integration. Empirical results reveal that current methods fall short of mastering all four competencies, underscoring the need for further research into comprehensive memory mechanisms for LLM agents.
Similar Papers
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Computation and Language
Tests how well AI remembers stories and makes choices.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Computation and Language
Helps AI agents remember and learn from past tasks.
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
Artificial Intelligence
Helps AI remember and learn new things.