Experimental Design for Semiparametric Bandits
By: Seok-Jin Kim, Gi-Soo Kim, Min-hwan Oh
Potential Business Impact:
Finds the best choice even when things change.
We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.
Similar Papers
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget
Machine Learning (CS)
Finds the best option quickly with limited money.
On the Hardness of Bandit Learning
Machine Learning (CS)
Finds the best option faster with fewer tries.
On the optimal regret of collaborative personalized linear bandits
Machine Learning (CS)
Helps many AI agents learn faster together.