Score: 1

Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads

Published: April 14, 2025 | arXiv ID: 2504.10184v1

By: Mert Yildiz, Alexey Rolich, Andrea Baiocchi

Potential Business Impact:

Makes computers finish jobs faster by smarter organizing.

Business Areas:

Scheduling Information Technology, Software

Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. Through extensive data-driven simulations, we aim to highlight the key features of workload traffic traces that influence response time performance under simple yet representative dispatching policies. For a given computational power budget, we vary the cluster size, i.e., the number of available servers. A job-level analysis reveals that Join Idle Queue (JIQ) and Least Work Left (LWL) exhibit an optimal working point for a fixed utilization coefficient as the number of servers is varied, whereas Round Robin (RR) demonstrates monotonously worsening performance. Additionally, we explore the accuracy of simple G/G queue approximations. When decomposing jobs into tasks, interesting results emerge; notably, the simpler, non-size-based policy JIQ appears to outperform the more "powerful" size-based LWL policy. Complementing these findings, we present preliminary results on a two-stage scheduling approach that partitions tasks based on service thresholds, illustrating that modest architectural modifications can further enhance performance under realistic workload conditions. We provide insights into these results and suggest promising directions for fully explaining the observed phenomena.

"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

Distributed, Parallel, and Cluster Computing

Makes computer jobs finish faster by sorting them.

5 May 2025 0

89%

The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture

Distributed, Parallel, and Cluster Computing

Makes computer jobs finish faster with smart server setups.

20 Mar 2025 0

87%

Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivity

Probability

Makes computer networks handle more tasks faster.

13 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇹 Italy

Repos / Data Links

github.com

Page Count

9 pages

Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads

Makes computers finish jobs faster by smarter organizing.

Technical Abstract

"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture

Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivity