Locally Private Hypothesis Selection
Sivakanth Gopi, Gautam Kamath, Janardhan D Kulkarni, Aleksandar Nikolov, Steven Wu, Huanyu Zhang
Subject areas: Privacy, fairness, Distribution learning/testing
Presented in: Session 1C, Session 4A
[Zoom link for poster in Session 1C], [Zoom link for poster in Session 4A]
Abstract:
We initiate the study of hypothesis selection under local differential privacy.\n Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution.\n This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p \in \mathcal{Q}$, and we wish to identify $p$.\n Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy.\n However, the naive approach to this problem under local differential privacy would require $\tilde O(k^2)$ samples.\n\n We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $\Omega(k)$ samples.\n Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $\tilde O(k)$ samples.\n Finally, we provide sequentially interactive algorithms for the general case, requiring $\tilde O(k)$ samples and only $O(\log \log k)$ rounds of interactivity.\n Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting.\n For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$.\n Notably, our algorithms result in exponential improvements on the round complexity of previous methods.