


























Despite being widely adopted as a canonical framework for learning robust models, adversarial training suffers from robust overfitting. Existing empirical measures and theoretical explorations are insufficient to provide satisfying mechanistic insights into the phenomenon. By viewing adversarial training with momentum SGD as a discrete-time dynamical system, we introduce a PAC-Bayesian analytical framework that proves time-resolved robust generalization bounds. Specifically, our framework tracks the closed-form evolution of the posterior mean and covariance under both stationary and non-stationary transient regimes, revealing their connections to the learning rate, the geometry of the loss landscape, and mini-batch stochastic gradients. By empirically approximating the statistical quantities implied by our theory, we offer a unified, mechanistic explanation for robust overfitting. We also illustrate why adversarial weight perturbation reduces the robust generalization gap by suppressing the loss curvature, but its design may be suboptimal for optimization due to over-penalization.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。