arXiv:2406.07892 Abstract | arXiv Analytics

arXiv:2406.07892 [cs.LG]Abstract References Reviews Resources

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

Published 2024-06-12Version 1

Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed with/without regularization. Our bounds exhibit exponential decay for the initial error, while the overall bound is $O(1/t)$, where $t$ is the number of update iterations of the TD algorithm. Further, the bound for the regularized TD variant is for a universal step size. Our bounds open avenues for analysis of actor-critic algorithms for mean-variance optimization in a discounted MDP.

Categories: cs.LG, cs.AI

Keywords: finite time analysis, temporal difference learning, discounted mdp, mean-variance, discounted reward markov decision process

Related articles: Most relevant | Search more

arXiv:1806.02450 [cs.LG] (Published 2018-06-06)

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal

arXiv:2210.05918 [cs.LG] (Published 2022-10-12)

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

arXiv:1809.07435 [cs.LG] (Published 2018-09-20)

Predicting Periodicity with Temporal Difference Learning

Kristopher De Asis, Brendan Bennett, Richard S. Sutton