
























Abstract:Model-based reinforcement learning (RL) methods that leverage search are responsible for many milestone breakthroughs in RL. Sequential Monte Carlo (SMC) recently emerged as an alternative to the Monte Carlo Tree Search (MCTS) algorithm which drove these breakthroughs. SMC is easier to parallelize and more suitable to GPU acceleration. However, it also suffers from large variance and path degeneracy which prevent it from scaling well with increased search depth, i.e., increased sequential compute. To address these problems, we introduce Twice Sequential Monte Carlo Tree Search (TSMCTS). Across discrete and continuous environments TSMCTS outperforms the SMC baseline as well as a popular modern version of MCTS as a policy improvement operator, scales favorably with sequential compute, reduces estimator variance and mitigates the effects of path degeneracy while retaining the properties that make SMC natural to parallelize.
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2511.14220 [cs.LG] |
| (or arXiv:2511.14220v3 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2511.14220 arXiv-issued DOI via DataCite |
From: Yaniv Oren [view email]
[v1]
Tue, 18 Nov 2025 07:54:29 UTC (146 KB)
[v2]
Mon, 9 Feb 2026 15:18:58 UTC (188 KB)
[v3]
Thu, 21 May 2026 10:21:28 UTC (190 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。