





















Abstract:Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from the training error -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical VC-dimension theory. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes (where measurability of suprema is straightforward in Mathlib) to separable topological index sets via a reduction to a countable dense subset. As worked applications of the abstract theorem, we mechanize standard empirical Rademacher bounds for linear predictors under $\ell_2$ and $\ell_1$ regularizations, and we also formalize a Dudley-type entropy integral bound based on covering numbers and a chaining construction.
| Comments: | accepted at ITP2026 |
| Subjects: | Machine Learning (cs.LG); Computation and Language (cs.CL); Statistics Theory (math.ST) |
| Cite as: | arXiv:2503.19605 [cs.LG] |
| (or arXiv:2503.19605v5 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2503.19605 arXiv-issued DOI via DataCite |
From: Sho Sonoda Dr [view email]
[v1]
Tue, 25 Mar 2025 12:40:43 UTC (113 KB)
[v2]
Tue, 1 Apr 2025 02:26:31 UTC (113 KB)
[v3]
Mon, 15 Sep 2025 09:48:25 UTC (119 KB)
[v4]
Fri, 20 Feb 2026 13:15:11 UTC (21 KB)
[v5]
Sun, 24 May 2026 18:17:33 UTC (34 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。