





















This paper presents a robust reinforcement learning algorithm called robust deterministic policy gradient (RDPG), which reformulates the H-infinity control problem as a two-player zero-sum dynamic game between a user and an adversary. The method combines deterministic policy gradients with deep reinforcement learning to train a robust policy that attenuates disturbances efficiently. A practical variant, robust deep deterministic policy gradient (RDDPG), integrates twin-delayed updates for stability and sample efficiency. Experiments on an unmanned aerial vehicle demonstrate superior robustness and tracking accuracy under severe disturbance conditions.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。