

















Abstract:Bokeh rendering and depth estimation share a fundamental optical connection, yet existing methods fail to fully exploit this reciprocity. Conventional bokeh pipelines rely heavily on noisy depth maps that inevitably introduce visual artifacts. Conversely, existing monocular depth models typically follow two flawed paradigms. Generative diffusion-based frameworks often lack consistent metric scale. Meanwhile, feed-forward metric depth models frequently fail in textureless or distant regions where defocus blur can provide geometric information. We propose BokehDepth, a two-stage framework that treats synthetic defocus as a supervision-free geometric signal. In the first stage, a physically grounded generative model produces calibrated bokeh stacks from a single sharp input without requiring prior depth input. Subsequently, a lightweight defocus-aware aggregation module integrates these stacks into the encoder of a depth estimation framework. This mechanism allows the model to extract consistent geometric features from the defocus dimension while keeping the decoder architecture unchanged. Experiments demonstrate that BokehDepth achieves superior visual bokeh fidelity compared to depth-dependent rendering baselines and consistently enhances the metric accuracy of state-of-the-art monocular depth models.
| Comments: | Project Page: this https URL |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2512.12425 [cs.CV] |
| (or arXiv:2512.12425v2 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2512.12425 arXiv-issued DOI via DataCite |
|
| Journal reference: | ICML 2026 |
From: Hangwei Zhang [view email]
[v1]
Sat, 13 Dec 2025 18:39:23 UTC (48,847 KB)
[v2]
Mon, 25 May 2026 12:13:04 UTC (38,315 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。