





















Abstract:Visual navigation ability is strongly tied to its underlying representation of the world. Unlike classical 3D maps that require globally-consistent geometry, image- or object-relative topological graphs almost entirely do away with geometric understanding. But, this comes at the cost of navigation capability, often limiting it to merely teach-and-repeat. In this work, we propose a novel map representation in the form of pixel-relative connectivity, which is geometrically accurate but does not require global geometric consistency. Inspired by recent progress in 3D grounded image matching, we construct a map from an image sequence through inter-image connectivity based on pixel correspondences in the relative 3D coordinate systems of individual image pairs. We then use this pixel-level graph to perform global path planning by approximating and sparsifying intra-image pixel connectivity. Through this, we derive a ''WayPixel Costmap'' representation and train a controller conditioned on it to predict a trajectory rollout. We show that this dense pixel-level costmap based on relative geometry is a more accurate conditioning variable for control prediction than its image- and object-level counterparts. This enables a highly capable navigation system, as validated on four types of navigation tasks in the simulator and through real world demonstrations.
| Comments: | 2026 IEEE International Conference on Robotics & Automation (ICRA) |
| Subjects: | Robotics (cs.RO); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.24111 [cs.RO] |
| (or arXiv:2605.24111v1 [cs.RO] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24111 arXiv-issued DOI via DataCite (pending registration) |
From: Vansh Garg [view email]
[v1]
Fri, 22 May 2026 18:18:07 UTC (13,630 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。