arXiv:2109.08134 Abstract | arXiv Analytics

arXiv:2109.08134 [cs.LG]Abstract References Reviews Resources

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez

Published 2021-09-16Version 1

In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

Comments: ICML Workshop on Reinforcement Learning Theory 2021

Categories: cs.LG, stat.ML

Keywords: regularization methods, batch reinforcement learning, batch data set influence, unification, common framework

Related articles: Most relevant | Search more

arXiv:1910.05821 [cs.LG] (Published 2019-10-13)

Policy Poisoning in Batch Reinforcement Learning and Control

Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu

arXiv:1905.00360 [cs.LG] (Published 2019-05-01)

Information-Theoretic Considerations in Batch Reinforcement Learning

Jinglin Chen, Nan Jiang

arXiv:2003.03924 [cs.LG] (Published 2020-03-09)

$Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison