Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions
By: Linggao Kong , Yuedong Xu , Lei Jiao and more
As foundation models grow in size, fine-tuning them becomes increasingly expensive. While GPU spot instances offer a low-cost alternative to on-demand resources, their volatile prices and availability make deadline-aware scheduling particularly challenging. We tackle this difficulty by using a mix of spot and on-demand instances. Distinctively, we show the predictability of prices and availability in a spot instance market, the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors. An integer programming problem is formulated to capture the use of mixed instances under both the price and availability dynamics. We propose an online allocation algorithm with prediction based on the committed horizon control approach that leverages a \emph{commitment level} to enforce the partial sequence of decisions. When this prediction becomes inaccurate, we further present a complementary online algorithm without predictions. An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms. We prove that the prediction-based algorithm achieves tighter performance bounds as prediction error decreases, while the policy selection algorithm possesses a regret bound of $\mathcal{O}(\sqrt{T})$. Experimental results demonstrate that our online framework can adaptively select the best policy under varying spot market dynamics and prediction quality, consistently outperforming baselines and improving utility by up to 54.8\%.
Similar Papers
Integrated Offline and Online Learning to Solve a Large Class of Scheduling Problems
Optimization and Control
Helps factories finish jobs faster.
A Switching Framework for Online Interval Scheduling with Predictions
Machine Learning (CS)
Helps computers pick the best jobs to do
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Distributed, Parallel, and Cluster Computing
Makes training computer brains much faster.