Score: 1

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management

Published: September 14, 2025 | arXiv ID: 2509.11134v1

By: Jiaang Duan , Shenglin Xu , Shiyou Qian and more

Potential Business Impact:

Saves money by better sharing computer power.

Business Areas:

Cloud Computing Internet Services, Software

The surge in large language models (LLMs) has fundamentally reshaped the landscape of GPU usage patterns, creating an urgent need for more efficient management strategies. While cloud providers employ spot instances to reduce costs for low-priority (LP) tasks, existing schedulers still grapple with high eviction rates and lengthy queuing times. To address these limitations, we present GFS, a novel preemptive scheduling framework that enhances service-level objective (SLO) compliance for high-priority (HP) tasks while minimizing preemptions to LP tasks. Firstly, GFS utilizes a lightweight forecasting model that predicts GPU demand among different tenants, enabling proactive resource management. Secondly, GFS employs a dynamic allocation mechanism to adjust the spot quota for LP tasks with guaranteed durations. Lastly, GFS incorporates a preemptive scheduling policy that prioritizes HP tasks while minimizing the impact on LP tasks. We demonstrate the effectiveness of GFS through both real-world implementation and simulations. The results show that GFS reduces eviction rates by 33.0\%, and cuts queuing delays by 44.1\% for LP tasks. Furthermore, GFS enhances the GPU allocation rate by up to 22.8\% in real production clusters. In a production cluster of more than 10,000 GPUs, GFS yields roughly \$459,715 in monthly benefits.

FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs

Distributed, Parallel, and Cluster Computing

Saves energy and pollution by smart computer use.

2 Nov 2025 1

88%

FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs

Distributed, Parallel, and Cluster Computing

Saves energy and pollution by smart computer use.

2 Nov 2025 1

85%

GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM Serving

Performance

Saves energy when computers think big thoughts.

22 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com

Page Count

15 pages

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management

Saves money by better sharing computer power.

Technical Abstract

FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs

FREESH: Fair, Resource- and Energy-Efficient Scheduling for LLM Serving on Heterogeneous GPUs

GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM Serving