Multi-thresholding Good Arm Identification with Bandit Feedback
By: Xuanke Jiang , Sherief Hashima , Kohei Hatano and more
Potential Business Impact:
Finds the best option when there are many goals.
We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i \in [K]$ is associated with a distribution $D_i$ defined over $R^M$. For each round $t$, the player pulls an arm $i_t$ and receives an $M$-dimensional reward vector sampled according to $D_{i_t}$. The goal is to find, with high probability, an $\epsilon$-good arm whose expected reward vector is larger than $\bm{\xi} - \epsilon \mathbf{1}$, where $\bm{\xi}$ is a predefined threshold vector, and the vector comparison is component-wise. We propose the Multi-Thresholding UCB~(MultiTUCB) algorithm with a sample complexity bound. Our bound matches the existing one in the special case where $M=1$ and $\epsilon=0$. The proposed algorithm demonstrates superior performance compared to baseline approaches across synthetic and real datasets.
Similar Papers
Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms
Machine Learning (CS)
Finds the best option near a target.
Learning to Coordinate Under Threshold Rewards: A Cooperative Multi-Agent Bandit Framework
Multiagent Systems
Helps robots work together to get rewards.
Near Optimal Best Arm Identification for Clustered Bandits
Machine Learning (CS)
Helps many robots learn the best action faster.