Sequential prediction under log-loss and misspecification
Meir Feder , Yury Polyanskiy
Session: Online Learning, Game Theory 2 (B)
Session Chair: Vidya K Muthukumar
Poster: Poster Session 2
Abstract:
We consider the question of sequential prediction under the log-loss in terms of cumulative regret. Namely,
given a hypothesis class of distributions, learner sequentially predicts the (distribution of the) next letter in sequence
and its performance is compared to the baseline of the best constant predictor from the hypothesis class. The
well-specified case corresponds to an additional assumption that the data-generating distribution belongs to the
hypothesis class as well. Here we present results in the more general misspecified case. Due to special properties of the
log-loss, the same problem arises in the context of competitive-optimality in density estimation, and model selection.
For the $d$-dimensional Gaussian location hypothesis class, we show that cumulative regrets
in the well-specified and misspecified cases asymptotically coincide. In other words, we
provide an $o(1)$ characterization of the distribution-free (or PAC) regret in this case -- the first such result
as far as we know. We recall that the
worst-case (or individual-sequence) regret in this case is larger by an additive constant ${d\over 2} + o(1)$.
Surprisingly, neither the traditional Bayesian estimators, nor the Shtarkov's normalized maximum
likelihood achieve the PAC regret and our estimator requires special ``robustification'' against heavy-tailed data.
In addition, we show two general results for misspecified regret: the existence and uniqueness of the optimal
estimator, and the bound sandwiching the misspecified regret between well-specified regrets with (asymptotically)
close hypotheses classes.