LLMSched: Uncertainty-Aware Workload Scheduling for Compound LLM Applications
By: Botao Zhu , Chen Chen , Xiaoyi Fan and more
Potential Business Impact:
Makes smart computer programs run faster.
Developing compound Large Language Model (LLM) applications is becoming an increasingly prevalent approach to solving real-world problems. In these applications, an LLM collaborates with various external modules, including APIs and even other LLMs, to realize complex intelligent services. However, we reveal that the intrinsic duration and structural uncertainty in compound LLM applications pose great challenges for LLM service providers in serving and scheduling them efficiently. In this paper, we propose LLMSched, an uncertainty-aware scheduling framework for emerging compound LLM applications. In LLMSched, we first design a novel DAG-based model to describe the uncertain compound LLM applications. Then, we adopt the Bayesian network to comprehensively profile compound LLM applications and identify uncertainty-reducing stages, along with an entropy-based mechanism to quantify their uncertainty reduction. Combining an uncertainty reduction strategy and a job completion time (JCT)-efficient scheme, we further propose an efficient scheduler to reduce the average JCT. Evaluation of both simulation and testbed experiments on various representative compound LLM applications shows that compared to existing state-of-the-art scheduling schemes, LLMSched can reduce the average JCT by 14~79%.
Similar Papers
SmartLLMs Scheduler: A Framework for Cost-Effective LLMs Utilization
Software Engineering
Makes AI answer questions faster and cheaper.
Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems
Distributed, Parallel, and Cluster Computing
Lets computers solve hard scheduling puzzles from words.
A Predictive and Synergistic Two-Layer Scheduling Framework for LLM Serving
Distributed, Parallel, and Cluster Computing
Makes AI answer questions faster and more reliably.