




























We address how to exploit power control data, gathered from a monitored environment, for performing power control in an unexplored environment. We adopt offline deep reinforcement learning, whereby the agent learns the policy to produce the transmission powers solely by using the data. Experiments demonstrate that despite discrepancies between the monitored and unexplored environments, the agent successfully learns the power control very quickly, even if the objective functions in the monitored and unexplored environments are dissimilar. About one third of the collected data is sufficient to be of high-quality and the rest can be from any sub-optimal algorithm.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。