

















Abstract:Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed.
We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric.
We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages.
Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messages via CGDC-guided attention.
Experiments on no-teammate-vision variants of Cooperative Navigation and Predator Prey, and on SMAC maps across multiple delay levels show consistent improvements in performance, robustness, and generalization, with ablations validating each component.
| Subjects: | Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA) |
| Cite as: | arXiv:2604.03785 [cs.AI] |
| (or arXiv:2604.03785v2 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2604.03785 arXiv-issued DOI via DataCite |
From: Zihong Gao [view email]
[v1]
Sat, 4 Apr 2026 16:14:41 UTC (6,513 KB)
[v2]
Tue, 26 May 2026 03:49:50 UTC (6,691 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。