

















Abstract:Despite the generative capabilities of diffusion and flow models, real-image editing remains constrained by a persistent trade-off between semantic editability and structural fidelity. We trace a primary cause of this limitation to the implicit coupling of edit progress with model scale in existing paradigms. Under this coupling, stronger edits typically require visiting noisier states, which spends computation on destabilizing layout before the semantic change is well localized. We introduce NaviEdit, a training-free inference-time controller that decouples edit progress from model scale traversal through a strict self-consistency contract. NaviEdit operates at the rollout level and leaves the underlying pretrained model unchanged. It treats scale as a control input and reallocates a fixed step budget toward semantically responsive intermediate scales instead of destructive high-noise regimes. Experiments show positive average gains across compatible editors and flow backbones, supporting decoupling as a portable inference-time control principle.
| Comments: | Accepted by ICML 2026 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.21190 [cs.CV] |
| (or arXiv:2605.21190v2 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.21190 arXiv-issued DOI via DataCite |
From: Yang Shi [view email]
[v1]
Wed, 20 May 2026 13:53:13 UTC (13,474 KB)
[v2]
Sat, 23 May 2026 12:57:01 UTC (13,474 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。