Learning Virtual Machine Scheduling in Cloud Computing through Language Agents
By: JieHao Wu , Ziwei Wang , Junjie Sheng and more
Potential Business Impact:
Helps computers pack more tasks into cloud servers.
In cloud services, virtual machine (VM) scheduling is a typical Online Dynamic Multidimensional Bin Packing (ODMBP) problem, characterized by large-scale complexity and fluctuating demands. Traditional optimization methods struggle to adapt to real-time changes, domain-expert-designed heuristic approaches suffer from rigid strategies, and existing learning-based methods often lack generalizability and interpretability. To address these limitations, this paper proposes a hierarchical language agent framework named MiCo, which provides a large language model (LLM)-driven heuristic design paradigm for solving ODMBP. Specifically, ODMBP is formulated as a Semi-Markov Decision Process with Options (SMDP-Option), enabling dynamic scheduling through a two-stage architecture, i.e., Option Miner and Option Composer. Option Miner utilizes LLMs to discover diverse and useful non-context-aware strategies by interacting with constructed environments. Option Composer employs LLMs to discover a composing strategy that integrates the non-context-aware strategies with the contextual ones. Extensive experiments on real-world enterprise datasets demonstrate that MiCo achieves a 96.9\% competitive ratio in large-scale scenarios involving more than 10,000 virtual machines. It maintains high performance even under nonstationary request flows and diverse configurations, thus validating its effectiveness in complex and large-scale cloud environments.
Similar Papers
Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?
Distributed, Parallel, and Cluster Computing
Makes computers run faster and use less power.
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
Machine Learning (CS)
Makes computers learn faster using less power.
Semantic-Aware Scheduling for GPU Clusters with Large Language Models
Machine Learning (CS)
Makes computer jobs finish much faster.