Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm
By: Mansoor Davoodi, Setareh Maghsudi
Potential Business Impact:
Helps computers choose best options with many goals.
Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.
Similar Papers
Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms
Machine Learning (Stat)
Helps computers choose the best option, even when risky.
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
Machine Learning (CS)
Fairly shares rewards, making systems work better.
Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference
Machine Learning (CS)
Helps smart systems learn faster with neighbors.