
























Abstract:Two-player games such as board games have long been used as traditional benchmarks for reinforcement learning. This work revisits a policy optimization method with reverse Kullback-Leibler regularization and entropy regularization and analyzes this combination in two-player zero-sum settings from theoretical and empirical perspectives. From a theoretical perspective, we investigate the stability of the policy update rule in two theoretical settings: game-theoretic normal-form games and finite-length games. We provide novel convergence guarantees and verify our theoretical results through numerical experiments on synthetic games. From an empirical perspective, we derive a practical model-free reinforcement learning algorithm based on the regularized policy optimization. We validate the training efficiency of our algorithm through comprehensive experiments on five board games: Animal Shogi, Gardner Chess, Go, Hex, and Othello. Experimental results show that our agent learns more efficiently than existing methods across environments.
| Comments: | Accepted at ICML 2026 |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2602.10894 [cs.LG] |
| (or arXiv:2602.10894v2 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2602.10894 arXiv-issued DOI via DataCite |
From: Kazuki Ota [view email]
[v1]
Wed, 11 Feb 2026 14:25:38 UTC (4,679 KB)
[v2]
Thu, 21 May 2026 09:51:26 UTC (5,027 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。