Extrapolating the profile of a finite population
Yihong Wu, Yury Polyanskiy, Soham Jana
Subject areas: Distribution learning/testing, High-dimensional statistics
Presented in: Session 4A, Session 4C
[Zoom link for poster in Session 4A], [Zoom link for poster in Session 4C]
Abstract:
We study a prototypical problem in empirical Bayes. Namely,\n consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty).\n Without any structural restrictions, it is impossible to learn the \n composition of the full population having observed only a small (random) subsample of size $m = o(k)$. \n Nevertheless, we show that in the sublinear regime of $m =\omega(k/\log k)$, it is possible to consistently estimate in total variation \n the \emph{profile} of the population, defined as the empirical\n distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove\n that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $\Theta(1/\log k)$. \n Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size $k$.\n We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality.\n The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.