Score: 0

Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications

Published: March 17, 2025 | arXiv ID: 2503.13343v1

By: Andre Merzky , Mikhail Titov , Matteo Turilli and more

Potential Business Impact:

Lets computers learn from science data faster.

Business Areas:

PaaS Software

Hybrid workflows combining traditional HPC and novel ML methodologies are transforming scientific computing. This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out-HPC workflows. Our runtime system enables distributed ML capabilities, efficient resource management, and seamless HPC/ML coupling across local and remote platforms. Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads. This lays the foundation for prototyping three representative data-driven workflow applications and executing them at scale on leadership-class HPC platforms.

Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloads

Distributed, Parallel, and Cluster Computing

Makes supercomputers run science and AI faster.

25 Sep 2025 0

88%

RHAPSODY: Execution of Hybrid AI-HPC Workflows at Scale

Distributed, Parallel, and Cluster Computing

Lets supercomputers run AI and science together.

23 Dec 2025 0

87%

A Unifying Framework to Enable Artificial Intelligence in High Performance Computing Workflows

Distributed, Parallel, and Cluster Computing

Lets supercomputers and AI work together better.

5 May 2025 0

View PDF Login to Bookmark

Page Count

9 pages

Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications

Lets computers learn from science data faster.

Technical Abstract

Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloads

RHAPSODY: Execution of Hybrid AI-HPC Workflows at Scale

A Unifying Framework to Enable Artificial Intelligence in High Performance Computing Workflows