Score: 0

Multi-thresholding Good Arm Identification with Bandit Feedback

Published: March 13, 2025 | arXiv ID: 2503.10386v3

By: Xuanke Jiang , Sherief Hashima , Kohei Hatano and more

Potential Business Impact:

Finds the best option when there are many goals.

Business Areas:

A/B Testing Data and Analytics

We consider a good arm identification problem in a stochastic bandit setting with multi-objectives, where each arm $i \in [K]$ is associated with a distribution $D_i$ defined over $R^M$. For each round $t$, the player pulls an arm $i_t$ and receives an $M$-dimensional reward vector sampled according to $D_{i_t}$. The goal is to find, with high probability, an $\epsilon$-good arm whose expected reward vector is larger than $\bm{\xi} - \epsilon \mathbf{1}$, where $\bm{\xi}$ is a predefined threshold vector, and the vector comparison is component-wise. We propose the Multi-Thresholding UCB~(MultiTUCB) algorithm with a sample complexity bound. Our bound matches the existing one in the special case where $M=1$ and $\epsilon=0$. The proposed algorithm demonstrates superior performance compared to baseline approaches across synthetic and real datasets.