Hummingbird: SLO-Oriented GPU Preemption at Microsecond-scale
By: Tiancheng Hu , Chenxi Wang , Ting Cao and more
Potential Business Impact:
Makes computer graphics run faster and smoother.
Existing GPU-sharing techniques, including spatial and temporal sharing, aim to improve utilization but face challenges in simultaneously ensuring SLO adherence and maximizing efficiency due to the lack of fine-grained task scheduling on closed-source GPUs. This paper presents Hummingbird, an SLO-oriented GPU scheduling system that overcomes these challenges by enabling microsecond-scale preemption on closed-source GPUs while effectively harvesting idle GPU time slices. Comprehensive evaluations across diverse GPU architectures reveal that Hummingbird improves the SLO attainment of high-priority tasks by 9.7x and 3.5x compared to the state-of-the-art spatial and temporal-sharing approaches. When compared to executing exclusively, the SLO attainment of the high-priority task, collocating with low-priority tasks on Hummingbird, only drops by less than 1%. Meanwhile, the throughput of the low-priority task outperforms the state-of-the-art temporal-sharing approaches by 2.4x. Hummingbird demonstrates significant effectiveness in ensuring the SLO while enhancing GPU utilization.
Similar Papers
Reducing Fragmentation and Starvation in GPU Clusters through Dynamic Multi-Objective Scheduling
Distributed, Parallel, and Cluster Computing
Makes AI computers use their power better.
ML Inference Scheduling with Predictable Latency
Machine Learning (CS)
Makes AI run faster without slowing down.
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
Distributed, Parallel, and Cluster Computing
Saves money by better sharing computer power.