
























Abstract:In this paper, we investigate whether deep reinforcement-learning agents interacting in a shared optimal-execution environment can sustain supra-competitive outcomes, in the sense of achieving lower implementation shortfalls than the relevant game-theoretical competitive benchmark. We study a two-agent Almgren-Chriss liquidation game and examine how learned behavior depends on intra-episode environment feedback, the ability to interpret the mid-price and the agent's knoledge of the past. We first use ex-ante schedule-learning agents to remove intra-episode feedback and isolate what can arise when agents commit to complete liquidation trajectories before execution begins. We then allow agents to condition on the evolving state using a variety of DDQN architectures. We find that, when agents are given access to intra-episode history, especially recent prices and own past actions, supra-competitive outcomes become substantially more frequent and more persistent. These findings indicate that supra-competitive behavior in this execution game is driven not by multi-agent learning or by current price observation alone, but by feedback, memory, and state-contingent interaction along the realized execution path.
| Subjects: | Computational Finance (q-fin.CP); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.20348 [q-fin.CP] |
| (or arXiv:2605.20348v1 [q-fin.CP] for this version) | |
| https://doi.org/10.48550/arXiv.2605.20348 arXiv-issued DOI via DataCite |
From: Christos Spyridon Koulouris [view email]
[v1]
Tue, 19 May 2026 18:03:48 UTC (12,432 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。