Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents
By: Hongqiu Ni , Jiabao Zhang , Guopeng Li and more
Potential Business Impact:
Makes smart computer programs finish tasks faster.
Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O and compute intensive nature and uses an enhanced HRRN policy to balance efficiency and fairness. Astraea also implements an adaptive KV cache manager that intelligently handles the agent state during I/O waits based on the system memory pressure. Extensive experiments show that Astraea reduces average JCT by up to 25.5\% compared to baseline methods. Moreover, our approach demonstrates strong robustness and stability under high load across various model scales.
Similar Papers
Astra: A Multi-Agent System for GPU Kernel Performance Optimization
Distributed, Parallel, and Cluster Computing
Makes computer programs run much faster automatically.
ASTRA: Agentic Steerability and Risk Assessment Framework
Cryptography and Security
Makes AI agents follow rules to prevent harm.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
Robotics
Space robot learns to control temperature better.