Online Newton Method for Bandit Convex Optimisation

Fokkema, Hidde; van der Hoeven, Dirk; Lattimore, Tor; Mayo, Jack J.

Session: 2C-1 (Bandits), Monday, Jul 01, 15:00-16:15

Abstract: We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \polylog(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $d^{2} \sqrt{n} \polylog(n, d)$.