
























Abstract:We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an $O(T^{-\frac{1}{4}}\log T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.
| Subjects: | Machine Learning (stat.ML); Machine Learning (cs.LG) |
| Cite as: | arXiv:2502.13822 [stat.ML] |
| (or arXiv:2502.13822v3 [stat.ML] for this version) | |
| https://doi.org/10.48550/arXiv.2502.13822 arXiv-issued DOI via DataCite |
From: Weichen Wu [view email]
[v1]
Wed, 19 Feb 2025 15:33:55 UTC (171 KB)
[v2]
Sat, 6 Sep 2025 14:14:25 UTC (176 KB)
[v3]
Thu, 21 May 2026 08:10:44 UTC (175 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。