arXiv:2306.09746 Abstract | arXiv Analytics

arXiv:2306.09746 [cs.LG]Abstract References Reviews Resources

Temporal Difference Learning with Experience Replay

Published 2023-06-16Version 1

Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the mini-batch sampled from the experience replay buffer.

Categories: cs.LG, cs.AI

Keywords: temporal difference learning, markovian observation model, finite-time error bounds, markovian noise terms, deep rl algorithms

Related articles: Most relevant | Search more

arXiv:1809.07435 [cs.LG] (Published 2018-09-20)

Predicting Periodicity with Temporal Difference Learning

Kristopher De Asis, Brendan Bennett, Richard S. Sutton

arXiv:2203.04955 [cs.LG] (Published 2022-03-09)

Temporal Difference Learning for Model Predictive Control

Nicklas Hansen, Xiaolong Wang, Hao Su

arXiv:1902.00923 [cs.LG] (Published 2019-02-03)

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

R. Srikant, Lei Ying

arXiv Analytics

arXiv:2306.09746 [cs.LG]Abstract References Reviews Resources

Temporal Difference Learning with Experience Replay

Links

Toolbox

arXiv:2306.09746 [cs.LG]AbstractReferencesReviewsResources

Temporal Difference Learning with Experience Replay

Links

Toolbox

arXiv:2306.09746 [cs.LG]Abstract References Reviews Resources