LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
By: Junhao Zheng , Xidi Cai , Qiuke Li and more
Potential Business Impact:
Helps AI remember and learn new things.
Lifelong learning is essential for intelligent agents operating in dynamic environments. Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time. Existing benchmarks treat agents as static systems and fail to evaluate lifelong learning capabilities. We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents. It provides skill-grounded, interdependent tasks across three interactive environments, Database, Operating System, and Knowledge Graph, with automatic label verification, reproducibility, and modular extensibility. Extensive experiments reveal that conventional experience replay has limited effectiveness for LLM agents due to irrelevant information and context length constraints. We further introduce a group self-consistency mechanism that significantly improves lifelong learning performance. We hope LifelongAgentBench will advance the development of adaptive, memory-capable LLM agents.
Similar Papers
Survey on Evaluation of LLM-based Agents
Artificial Intelligence
Tests how smart AI agents can act and learn.
Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark
Artificial Intelligence
AI learns like a person, growing smarter over time.
Multi-Mission Tool Bench: Assessing the Robustness of LLM based Agents through Related and Dynamic Missions
Artificial Intelligence
Tests AI that handles many jobs at once.