Score: 3

Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems

Published: April 16, 2025 | arXiv ID: 2504.12074v2

By: Yuxing Han , Lixiang Chen , Haoyu Wang and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Makes computer programs run faster by adjusting their parts.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. Adjusting the parallelism of these operators is crucial to handling fluctuating workloads efficiently while balancing resource usage and processing performance. However, existing methods often fail to effectively utilize execution histories or fully exploit DAG structures, limiting their ability to identity bottlenecks and determine the optimal parallelism. In this paper, we propose StreamTune, a novel approach for adaptive paralelism tuning in stream processing systems. StreamTune incorporates a pre-training and fine-tuning framework that leverages global knowledge from historical execution data for job-specific parallelism tuning. In the pre-training phase, Stream Tune clusters the historical data with Graph Edit Distance and pre-trains a Graph Neural Networkbased encoder per cluster to capture the correlation between the operator parallelism, DAG structures, and the identified operator-level bottlenecks. In the online tuning phase, StreamTune iteratively refines operator parallelism recommendations using an operator-level bottleneck prediction model enforced with a monotonic constraint, which aligns with the observed system performance behavior. Evaluation results demonstrate that StreamTune reduces reconfigurations by up to 29.6% and parallelism degrees by up to 30.8% in Apache Flink under a synthetic workload. In Timely Dataflow, StreamTune achieves up to an 83.3% reduction in parallelism degrees while maintaining comparable processing performance under the Nexmark benchmark, when compared to the state-of-the-art methods.

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

86%

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

Distributed, Parallel, and Cluster Computing

Makes AI models train much faster.

27 Sep 2025 1

86%

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

Distributed, Parallel, and Cluster Computing

Tests how fast computer programs process data.

14 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇺 🇨🇳 Australia, China

Page Count

15 pages

Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems

Makes computer programs run faster by adjusting their parts.

Technical Abstract

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing