Exploration by Optimisation in Partial Monitoring

Tor Lattimore; Csaba Szepesvari

Exploration by Optimisation in Partial Monitoring

Tor Lattimore, Csaba Szepesvari

[Proceedings link] [PDF]

Subject areas: Bandit problems, Online learning

Presented in: Session 2A, Session 2E

[Zoom link for poster in Session 2A], [Zoom link for poster in Session 2E]

Abstract

Abstract: We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient. The highlight is that for the non-degenerate locally observable games, the n-round minimax regret is bounded by 2mk^(3/2)sqrt(3n log(k)), where m is the number of signals. This matches the best known information-theoretic upper bound derived via Bayesian minimax duality. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games. High probability bounds and simple experiments are also provided.

Exploration by Optimisation in Partial Monitoring

Tor Lattimore, Csaba Szepesvari

Summary presentation

Full presentation