Tsallis-INF: Improved Robustness to Adversarial Corruptions inStochastic Multi-armed Bandits and Beyond
Saeed Masoudian , Yevgeny Seldin
Session: Bandits, RL and Control 2 (B)
Session Chair: Sattar Vakili
Poster: Poster Session 3
Abstract:
We derive improved regret bounds for the Tsallis-INF algorithm of \citet{zimmert2020}. We show that in adversarial regimes with a $(\Delta,C,T)$ self-bounding constraint the algorithm achieves
$\Ocal\lr{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{\lr{\sum_{i\neq i^*} \frac{1}{\Delta_i}}^2}}+\sqrt{C\lr{\sum_{i\neq i^*}\frac{1}{\Delta_i}}\log_+\lr{\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{\Delta_i}}}}}$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $\Delta_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\lr{1,\log x}$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases.
Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.