The Influence of Shape Constraints on the Thresholding Bandit Problem
James Cheshire, Pierre Menard, Alexandra Carpentier
Subject areas: Bandit problems, Convex optimization
Presented in: Session 2A, Session 2E
[Zoom link for poster in Session 2A], [Zoom link for poster in Session 2E]
Abstract:
We investigate the stochastic \emph{Thresholding Bandit problem} TBP under several \emph{shape constraints}. On top of (i) the vanilla, unstructured TBP, we consider the case where (ii) the sequence of arm's means $(\mu_k)_k$ is monotonically increasing MTBP, (iii) the case where $(\mu_k)_k$ is unimodal UTBP and (iv) the case where $(\mu_k)_k$ is concave CTBP. In the TBP problem the aim is to output, at the end of the sequential game, the set of arms whose means are above a given threshold. The regret is the highest gap between a misclassified arm and the threshold. \n In the fixed budget setting, we provide \emph{problem independent} minimax rates for the regret in all settings, as well as associated algorithms. We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for TBP, (ii) $\sqrt{\log(K)/T}$ for MTBP, (iii) $\sqrt{K/T}$ for UTBP and (iv) $\sqrt{\log\log K/T}$ for CTBP, where $K$ is the number of arms and $T$ is the budget. These rates demonstrate that \textit{the dependence on $K$} of the minimax regret varies significantly depending on the shape constraint. This highlights the fact that the shape constraints modify fundamentally the nature of the TBP.