Morpheus: Lightweight RTT Prediction for Performance-Aware Load Balancing
By: Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi and more
Potential Business Impact:
Predicts computer delays to make apps run faster.
Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources. Traditional load-balancing strategies are typically reactive and rely on outdated or coarse-grained metrics, often leading to suboptimal routing decisions and increased tail latencies. This paper investigates the use of round-trip time (RTT) predictors to enhance request routing by anticipating application latency. We develop lightweight and accurate RTT predictors that are trained on time-series monitoring data collected from a Kubernetes-managed GPU cluster. By leveraging a reduced set of highly correlated monitoring metrics, our approach maintains low overhead while remaining adaptable to diverse co-location scenarios and heterogeneous hardware. The predictors achieve up to 95% accuracy while keeping the prediction delay within 10% of the application RTT. In addition, we identify the minimum prediction accuracy threshold and key system-level factors required to ensure effective predictor deployment in resource-constrained clusters. Simulation-based evaluation demonstrates that performance-aware load balancing can significantly reduce application RTT and minimize resource waste. These results highlight the feasibility of integrating predictive load balancing into future production systems.
Similar Papers
Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling Approach
Networking and Internet Architecture
Predicts internet speed for faster apps.
Accurate Performance Predictors for Edge Computing Applications
Distributed, Parallel, and Cluster Computing
Helps computers guess how fast apps will run.
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Information Theory
Makes smart assistants answer faster and better.