Thompson Sampling for Multi-Objective Linear Contextual Bandit
By: Somangchan Park, Heesang Ann, Min-hwan Oh
Potential Business Impact:
Helps computers make better choices with many goals.
We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret guarantees for this problem. Unlike standard approaches that compute an empirical Pareto front each round, \texttt{MOL-TS} samples parameters across objectives and efficiently selects an arm from a novel \emph{effective Pareto front}, which accounts for repeated selections over time. Our analysis shows that \texttt{MOL-TS} achieves a worst-case Pareto regret bound of $\widetilde{O}(d^{3/2}\sqrt{T})$, where $d$ is the dimension of the feature vectors, $T$ is the total number of rounds, matching the best known order for randomized linear bandit algorithms for single objective. Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance.
Similar Papers
Adaptive Data Augmentation for Thompson Sampling
Machine Learning (Stat)
Learns the best choices faster for rewards.
Constrained Linear Thompson Sampling
Machine Learning (CS)
Helps computers learn safely and faster.
Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits
Machine Learning (CS)
Helps computers learn better with less guessing.