




























This paper describes a sequential, or online, learning scheme for adaptive radar transmissions that facilitate spectrum sharing with a non-cooperative cellular network. First, the interference channel between the radar and a spatially distant cellular network is modeled. Then, a linear Contextual Bandit (CB) learning framework is applied to drive the radar's behavior. The fundamental trade-off between exploration and exploitation is balanced by a proposed Thompson Sampling (TS) algorithm, a pseudo-Bayesian approach which selects waveform parameters based on the posterior probability that a specific waveform is optimal, given discounted channel information as context. It is shown that the contextual TS approach converges more rapidly to behavior that minimizes mutual interference and maximizes spectrum utilization than comparable contextual bandit algorithms. Additionally, we show that the TS learning scheme results in a favorable SINR distribution compared to other online learning algorithms. Finally, the proposed TS algorithm is compared to a deep reinforcement learning model. We show that the TS algorithm maintains competitive performance with a more complex Deep Q-Network (DQN).
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。