





















Abstract:Recent advances in large language model (LLM) have empowered autonomous agents to perform multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at this https URL.
| Comments: | Accepted to ICML 2026 |
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG) |
| Cite as: | arXiv:2602.10090 [cs.AI] |
| (or arXiv:2602.10090v3 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2602.10090 arXiv-issued DOI via DataCite |
From: Zhaoyang Wang [view email]
[v1]
Tue, 10 Feb 2026 18:55:41 UTC (6,435 KB)
[v2]
Wed, 11 Feb 2026 18:20:25 UTC (6,435 KB)
[v3]
Fri, 22 May 2026 21:39:46 UTC (6,449 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。