

















Abstract:Multi-agent LLM systems consistently outperform single-agent baselines, yet practitioners still cannot predict which design works for a new task or diagnose why one fails. We argue this gap persists largely because the field lacks a diagnostic framework with measurable primitives and testable predictions. We introduce \textbf{DIANOIA}, a three-channel decomposition of multi-agent reasoning gain into coverage, fidelity, and synthesis, each of which is empirically measurable. From this decomposition, we derive a diagnostic protocol that identifies the bottleneck channels for any given task. We instantiate the protocol as a multi-agent system whose three components mirror the channels: role-diverse proposers for coverage, execution-grounded verification for fidelity, and iterative synthesis. On GSM8K, AIME-2025, MBPP, and BFCL-SP, our method outperforms strong multi-agent baselines under matched token budgets, dominating the Pareto frontier on MBPP at $\sim$$5{\times}$ token savings and reaching $+4.6$pp at matched cost. On every benchmark, the protocol picks the right bottleneck channels; the system we built around it leads across models. We release code, adapters, diagnostic metrics, and a Claude Code skill at this https URL. DIANOIA reframes multi-agent design as channel-aware resource allocation: diagnose which channel is the bottleneck for your task, then invest tokens accordingly.
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2602.08586 [cs.AI] |
| (or arXiv:2602.08586v3 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2602.08586 arXiv-issued DOI via DataCite |
From: Yiming Yang [view email]
[v1]
Mon, 9 Feb 2026 12:24:56 UTC (1,292 KB)
[v2]
Tue, 10 Feb 2026 06:47:22 UTC (1,305 KB)
[v3]
Tue, 26 May 2026 13:47:05 UTC (136 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。