Workflow-Driven Modeling for the Compute Continuum: An Optimization Approach to Automated System and Workload Scheduling
By: Aasish Kumar Sharma , Christian Boehme , Patrick Gelß and more
Potential Business Impact:
Automates computer tasks across cloud and supercomputers.
The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources often introduces latency and communication overhead, which can hinder the performance of tightly coupled parallel applications. Additionally, achieving seamless interoperability between cloud and on-premises HPC systems requires advanced scheduling, resource management, and data transfer protocols. Consequently, users must manually allocate complex workloads across heterogeneous resources, leading to suboptimal task placement and reduced efficiency due to the absence of an automated scheduling mechanism. To overcome these challenges, we introduce a comprehensive framework based on rigorous system and workload modeling for the compute continuum. Our method employs established tools and techniques to optimize workload mapping and scheduling, enabling the automatic orchestration of tasks across both cloud and HPC infrastructures. Experimental evaluations reveal that our approach could optimally improve scheduling efficiency, reducing execution times, and enhancing resource utilization. Specifically, our MILP-based solution achieves optimal scheduling and makespan for small-scale workflows, while heuristic methods offer up to 99% faster estimations for large-scale workflows, albeit with a 5-10% deviation from optimal results. Our primary contribution is a robust system and workload modeling framework that addresses critical gaps in existing tools, paving the way for fully automated orchestration in HPC-compute continuum environments.
Similar Papers
A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC System
Distributed, Parallel, and Cluster Computing
Makes supercomputers run much faster and smarter.
Optimal Multi-Constrained Workflow Scheduling for Cyber-Physical Systems in the Edge-Cloud Continuum
Networking and Internet Architecture
Makes smart devices work faster together.
Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model
Distributed, Parallel, and Cluster Computing
Saves money running science projects on computers.