





















Abstract:Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward queries are infeasible, these models must be trained using static offline datasets. Prevailing training methods typically rely on a proxy model to provide reward feedback for online sampled trajectories. However, constructing a reliable proxy is often challenging due to data scarcity or high evaluation costs. While existing proxy-free approaches attempt to address this, they often impose coarse constraints that limit the model's ability to explore effectively. To overcome these limitations, we propose Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN utilizes inverse reinforcement learning (IRL) to extract dense, transition-level edge rewards from offline trajectories, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards guide the policy indirectly through DAG pruning and prioritized backward sampling. This design ensures that gradient updates rely exclusively on ground-truth terminal rewards from the dataset, thereby preventing error propagation. Empirical results demonstrate that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.
| Comments: | Camera-ready version accepted at ICML 2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2505.20110 [cs.LG] |
| (or arXiv:2505.20110v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2505.20110 arXiv-issued DOI via DataCite |
From: Chen Ruishuo [view email]
[v1]
Mon, 26 May 2025 15:12:22 UTC (2,420 KB)
[v2]
Fri, 26 Sep 2025 13:08:31 UTC (13,299 KB)
[v3]
Mon, 25 May 2026 16:15:16 UTC (14,269 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。