





















Abstract:Offline goal-conditioned reinforcement learning (GCRL) often struggles with long-horizon tasks, where errors in value estimation accumulate and produce unreliable policies. It is typically assumed that effective long-term planning is infeasible without specialized training. In contrast, our work demonstrates that existing GCRL policies can complete long-horizon tasks when combined with a lightweight, training-free planning wrapper. We find that standard goal-conditioned value functions encode locally consistent geometric structure sufficient for planning. Our approach, Test-Time Graph Search (TTGS), constructs a graph over the offline dataset and employs an adaptive subgoal selection strategy. To address unreliable value estimates during shortest-path search, we propose a novel mechanism that softly penalizes long-distance transitions. Our method incurs negligible computational overhead and requires no additional supervision or parameter updates. On the OGBench benchmark, TTGS significantly boosts success rates across multiple base learners and tasks, with primary gains on challenging long-horizon locomotion tasks where some success rates are improved from near-zero to over 90\%, often matching or outperforming methods that require complex auxiliary training. Code and videos can be found at this https URL.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2510.07257 [cs.LG] |
| (or arXiv:2510.07257v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2510.07257 arXiv-issued DOI via DataCite |
From: Evgenii Opryshko [view email]
[v1]
Wed, 8 Oct 2025 17:20:53 UTC (1,263 KB)
[v2]
Fri, 22 May 2026 22:31:02 UTC (2,499 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。