Exploration by Optimisation in Partial Monitoring
Tor Lattimore, Csaba Szepesvari
Subject areas: Bandit problems, Online learning
Presented in: Session 2A, Session 2E
[Zoom link for poster in Session 2A], [Zoom link for poster in Session 2E]
Abstract:
We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 2mk^(3/2)sqrt(3n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.