Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
By: Xinyu Zhu , Yuzhu Cai , Zexi Liu and more
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.
Similar Papers
The Path Ahead for Agentic AI: Challenges and Opportunities
Artificial Intelligence
AI learns to plan, remember, and act alone.
Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks
Computation and Language
Helps robots solve hard, long problems together.
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
Computation and Language
Helps computers remember more for longer tasks.