Enhancing Cluster Scheduling in HPC: A Continuous Transfer Learning for Real-Time Optimization
By: Leszek Sliwko, Jolanta Mizera-Pietraszko
Potential Business Impact:
Makes computer jobs run faster and smarter.
This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.
Similar Papers
Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads
Distributed, Parallel, and Cluster Computing
Makes computer jobs run faster by predicting delays.
Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
Distributed, Parallel, and Cluster Computing
Makes computer jobs run faster and use less power.
Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning
Distributed, Parallel, and Cluster Computing
Lets many small quantum computers work together.