arXiv Analytics

Sign in

arXiv:2109.08134 [cs.LG]AbstractReferencesReviewsResources

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez

Published 2021-09-16Version 1

In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

Comments: ICML Workshop on Reinforcement Learning Theory 2021
Categories: cs.LG, stat.ML
Related articles: Most relevant | Search more
arXiv:1910.05821 [cs.LG] (Published 2019-10-13)
Policy Poisoning in Batch Reinforcement Learning and Control
arXiv:1905.00360 [cs.LG] (Published 2019-05-01)
Information-Theoretic Considerations in Batch Reinforcement Learning
arXiv:2003.03924 [cs.LG] (Published 2020-03-09)
$Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison