Improved Regret for Zeroth-Order Stochastic Convex Bandits
Tor Lattimore , Andras Gyorgy
Session: Bandits, RL and Control 1 (A)
Session Chair: Yuxin Chen
Poster: Poster Session 2
Abstract:
We present an efficient algorithm for stochastic bandit convex optimisation with no assumptions on smoothness or strong convexity and for which the regret is bounded by O(d^(4.5) sqrt(n) polylog(n)), where n is the number of interactions and d is the dimension.