


























Abstract:As agentic network management gains popularity, there is a critical need for evaluation frameworks that transcend static, one-shot testing. To address this, we introduce NetAgentBench, a dynamic benchmark that evaluates agent interactions through a Finite State Machine (FSM) formalization guaranteeing determinism, correctness, and bounded execution. This provides the networking landscape with a rigorous foundation to measure complex, multi-turn operational behaviors. Our empirical evaluation of four state-of-the-art LLM agents through diverse network configuration tasks reveals stark deficiencies: while agents can solve basic tasks, they suffer severe exploration meltdowns and coherence collapse during expert-level configurations. Ultimately, NetAgentBench demonstrates that systematically evaluating multi-turn behavioral stability is an indispensable step toward realizing trustworthy, fully autonomous networks.
| Comments: | 9 pages |
| Subjects: | Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL) |
| Cite as: | arXiv:2604.09678 [cs.NI] |
| (or arXiv:2604.09678v1 [cs.NI] for this version) | |
| https://doi.org/10.48550/arXiv.2604.09678 arXiv-issued DOI via DataCite |
From: Ahmed Twabi [view email]
[v1]
Fri, 3 Apr 2026 05:11:05 UTC (1,127 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。