Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
Sebastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke
Subject areas: Bandit problems,
Presented in: Session 3B, Session 3D
[Zoom link for poster in Session 3B], [Zoom link for poster in Session 3D]
We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication and no shared randomness at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, assuming only two players, under the feedback model where collisions are announced to the colliding players. We also prove the first sublinear regret guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.