Prediction-Guided Control in Data Center Networks
By: Kevin Zhao , Chenning Li , Anton A. Zabreyko and more
Potential Business Impact:
Fixes slow internet in computer centers.
In this paper, we design, implement, and evaluate Polyphony, a system to give network operators a new way to control and reduce the frequency of poor tail latency events in multi-class data center networks, on the time scale of minutes. Polyphony is designed to be complementary to other adaptive mechanisms like congestion control and traffic engineering, but targets different aspects of network operation that have previously been considered static. By contrast to Polyphony, prior model-free optimization methods work best when there are only a few relevant degrees of freedom and where workloads and measurements are stable, assumptions not present in modern data center networks. Polyphony develops novel methods for measuring, predicting, and controlling network quality of service metrics for a dynamically changing workload. First, we monitor and aggregate workloads on a network-wide basis; we use the result as input to an approximate counterfactual prediction engine that estimates the effect of potential network configuration changes on network quality of service; we apply the best candidate and repeat in a closed-loop manner aimed at rapidly and stably converging to a configuration that meets operator goals. Using CloudLab on a simple topology, we observe that Polyphony converges to tight SLOs within ten minutes, and re-stabilizes after large workload shifts within fifteen minutes, while the prior state of the art fails to adapt.
Similar Papers
Stable and Fault-Tolerant Decentralized Traffic Engineering
Networking and Internet Architecture
Keeps internet traffic flowing smoothly and safely.
System-Level Performance and Communication Tradeoff in Networked Control with Predictions
Systems and Control
Helps robots work together without crashing.
Performance Analysis of Dynamic Equilibria in Joint Path Selection and Congestion Control
Networking and Internet Architecture
Fixes internet slowdowns from too many paths.