
























Abstract:Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed by kernel-level mechanisms, substantially reducing system overhead. It leverages lightweight environment pre-caching techniques to eliminate the need for bulky container images. As a result, our approach lowers disk usage to approximately 5\% of that required by container-based pipelines and reduces environment preparation time to about 25\% of the container baseline. Empirical results demonstrate that SWE-MiniSandbox achieves evaluation performance comparable to standard container-based pipelines. By removing the dependency on heavy container infrastructure, SWE-MiniSandbox offers a practical and accessible foundation for scaling RL-based SWE agents, particularly in resource-constrained research environments.
| Subjects: | Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| Cite as: | arXiv:2602.11210 [cs.SE] |
| (or arXiv:2602.11210v4 [cs.SE] for this version) | |
| https://doi.org/10.48550/arXiv.2602.11210 arXiv-issued DOI via DataCite |
From: Yuan Danlong [view email]
[v1]
Wed, 11 Feb 2026 02:33:04 UTC (3,460 KB)
[v2]
Mon, 2 Mar 2026 13:00:09 UTC (3,460 KB)
[v3]
Fri, 6 Mar 2026 11:45:53 UTC (3,460 KB)
[v4]
Thu, 21 May 2026 00:14:25 UTC (3,462 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。