Score: 0

Morpheus: Lightweight RTT Prediction for Performance-Aware Load Balancing

Published: October 23, 2025 | arXiv ID: 2510.20506v1

By: Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi and more

Potential Business Impact:

Predicts computer delays to make apps run faster.

Business Areas:

Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources. Traditional load-balancing strategies are typically reactive and rely on outdated or coarse-grained metrics, often leading to suboptimal routing decisions and increased tail latencies. This paper investigates the use of round-trip time (RTT) predictors to enhance request routing by anticipating application latency. We develop lightweight and accurate RTT predictors that are trained on time-series monitoring data collected from a Kubernetes-managed GPU cluster. By leveraging a reduced set of highly correlated monitoring metrics, our approach maintains low overhead while remaining adaptable to diverse co-location scenarios and heterogeneous hardware. The predictors achieve up to 95% accuracy while keeping the prediction delay within 10% of the application RTT. In addition, we identify the minimum prediction accuracy threshold and key system-level factors required to ensure effective predictor deployment in resource-constrained clusters. Simulation-based evaluation demonstrates that performance-aware load balancing can significantly reduce application RTT and minimize resource waste. These results highlight the feasibility of integrating predictive load balancing into future production systems.