Semantic-Aware Scheduling for GPU Clusters with Large Language Models
By: Zerui Wang , Qinghao Hu , Ana Klimovic and more
Potential Business Impact:
Makes computer jobs finish much faster.
Deep learning (DL) schedulers are pivotal in optimizing resource allocation in GPU clusters, but operate with a critical limitation: they are largely blind to the semantic context of the jobs they manage. This forces them to rely on limited metadata, leading to high profiling overhead, unreliable duration estimation, inadequate failure handling, and poor observability. To this end, we propose SchedMate, a framework that bridges this semantic gap by systematically extracting deep insights from overlooked, unstructured data sources: source code, runtime logs, and historical jobs. SchedMate enhances existing schedulers non-intrusively through three LLM-based components. Our implementation integrates seamlessly with existing deep learning schedulers. Evaluations on a 128-GPU physical cluster and extensive simulations on production traces show SchedMate reduces average job completion times by up to 1.91x, substantially enhancing the scheduling performance, demonstrating the critical role of semantic-awareness in modern DL scheduling.
Similar Papers
Resource Heterogeneity-Aware and Utilization-Enhanced Scheduling for Deep Learning Clusters
Distributed, Parallel, and Cluster Computing
Makes computer learning faster and better.
Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
Distributed, Parallel, and Cluster Computing
Makes computer jobs run faster and use less power.
SLO-Aware Scheduling for Large Language Model Inferences
Distributed, Parallel, and Cluster Computing
Makes AI answer questions faster and better.