
























Abstract:In the era of big data, subsampling became a common practice in statistical learning. By selecting a subgroup of individuals based on which the learner is trained, subsampling aims at reducing the computational cost and time of the estimation step, and ideally leads to a decrease of its energy consumption and carbon footprint. This work focuses on a nonparametric setting, in which the hypotheses set lies in a reproducing kernel Hilbert space, and the estimator is a minimizer of an empirical risk reweighted à la Horvitz-Thompson. By studying the asymptotic properties of this estimator, we reveal an optimal subsampling scheme (regarding the trace of the covariance operator) and show that it can be used via plug-in. A numerical study on synthetic and real-world datasets shows the practicability and the benefit of the proposed approach.
From: Maxime Sangnier [view email]
[v1]
Fri, 19 Jun 2026 09:36:10 UTC (3,648 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。