BALLAST: Bandit-Assisted Learning for Latency-Aware Stable Timeouts in Raft
By: Qizhi Wang
Randomized election timeouts are a simple and effective liveness heuristic for Raft, but they become brittle under long-tail latency, jitter, and partition recovery, where repeated split votes can inflate unavailability. This paper presents BALLAST, a lightweight online adaptation mechanism that replaces static timeout heuristics with contextual bandits. BALLAST selects from a discrete set of timeout "arms" using efficient linear contextual bandits (LinUCB variants), and augments learning with safe exploration to cap risk during unstable periods. We evaluate BALLAST on a reproducible discrete-event simulation with long-tail delay, loss, correlated bursts, node heterogeneity, and partition/recovery turbulence. Across challenging WAN regimes, BALLAST substantially reduces recovery time and unwritable time compared to standard randomized timeouts and common heuristics, while remaining competitive on stable LAN/WAN settings.
Similar Papers
BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories under Spatio-Temporal Vector Fields
Machine Learning (Stat)
Guides floating sensors to map ocean currents better.
Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback
Artificial Intelligence
Helps schools give help fairly to students.
MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment
Machine Learning (CS)
Lets smart systems learn from changing situations.