Data-driven confidence bands for distributed nonparametric regression
Valeriy Avanesov
Subject areas: Kernel methods, Excess risk bounds and generalization error bounds, Regression, Sampling algorithms, Supervised learning
Presented in: Session 1A, Session 1E
[Zoom link for poster in Session 1A], [Zoom link for poster in Session 1E]
Abstract:
Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. \nUnfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing the known risk bounds.