























Abstract:Diffusion-based speech enhancement architectures that pair a deterministic predictor with a learned score network, exhibit a sharp non-smooth transition (``kink'') in the SI-SDR degradation curve at the training-time noise amplitude. We give a pathwise variational-flow analysis that localizes this non-smoothness to the predictor stage. The central identity is an exact factorization of the parametric sensitivity, $\partial \sig^{(M)} / \partial M = K(M) \cdot \partial C_M / \partial M$, where $K(M)$ is a continuous matrix-valued functional of the score Jacobian along the reverse trajectory and $C_M = \Pi(y^{(M)})$ is the predictor output. Under three hypotheses on the reverse-process flow (score-Jacobian continuity, conditioning-Jacobian continuity, non-degeneracy of $K$), failure of $M \mapsto \sig^{(M)}$ to be $C^1$ at $M^\ast$ holds if and only if $M \mapsto \Pi(y^{(M)})$ fails to be $C^1$ at $M^\ast$. We extend the localization to the finite-step Euler--Maruyama sampler actually run at inference. The hypotheses translate into a concrete experimental program; this paper specifies the program and presents the variational structure. The empirical validation is deferred to a companion experimental report.
From: Shuubham Ojha [view email]
[v1]
Tue, 23 Jun 2026 00:35:55 UTC (10 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。