























Abstract:This paper provides a theoretical account of memorization in stochastic interpolation models. By leveraging closed-form expressions for the optimal velocity field and the associated score function, we show that, in the continuous-time oracle setting, both deterministic and stochastic generation processes recover training samples. Under Euler discretization, generated samples remain centered around training samples, with deviations controlled by the step size. We further analyze generation in the presence of estimation errors and show that accumulated estimation errors control the endpoint deviation from the training set. These results imply that the generated sample admits a representation as a training sample perturbed by three controlled terms: a discretization-induced bound, an estimation-error-induced bound, and stochastic Gaussian noise. Based on this characterization, we provide theoretical definitions of overfitting and underfitting in generative models. Synthetic simulations support our theoretical findings.
From: Yunchen Li [view email]
[v1]
Sun, 7 Jun 2026 10:14:07 UTC (2,419 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。