Score: 0

To Stream or Not to Stream: Towards A Quantitative Model for Remote HPC Processing Decisions

Published: September 23, 2025 | arXiv ID: 2509.19532v2

By: Flavio Castro , Weijian Zheng , Joaquin Chung and more

Potential Business Impact:

Lets scientists analyze huge data instantly.

Business Areas:
Cloud Computing Internet Services, Software

Modern scientific instruments generate data at rates that increasingly exceed local compute capabilities and, when paired with the staging and I/O overheads of file-based transfers, also render file-based use of remote HPC resources impractical for time-sensitive analysis and experimental steering. Real-time streaming frameworks promise to reduce latency and improve system efficiency, but lack a principled way to assess their feasibility. In this work, we introduce a quantitative framework and an accompanying Streaming Speed Score to evaluate whether remote high-performance computing (HPC) resources can provide timely data processing compared to local alternatives. Our model incorporates key parameters including data generation rate, transfer efficiency, remote processing power, and file input/output overhead to compute total processing completion time and identify operational regimes where streaming is beneficial. We motivate our methodology with use cases from facilities such as APS, FRIB, LCLS-II, and the LHC, and validate our approach through an illustrative case study based on LCLS-II data. Our measurements show that streaming can achieve up to 97% lower end-to-end completion time than file-based methods under high data rates, while worst-case congestion can increase transfer times by over an order of magnitude, underscoring the importance of tail latency in streaming feasibility decisions.

Page Count
6 pages

Category
Computer Science:
Distributed, Parallel, and Cluster Computing