arXiv:2201.08536 Abstract | arXiv Analytics

arXiv:2201.08536 [stat.ML]Abstract References Reviews Resources

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Koulik Khamaru, Eric Xia, Martin J. Wainwright, Michael I. Jordan

Published 2022-01-21Version 1

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.

Categories: stat.ML, cs.LG

Keywords: reinforcement learning, instance-optimal algorithm, early stopping, optimal value estimation problem, obtaining sharp instance-dependent confidence regions

Related articles: Most relevant | Search more

arXiv:2007.06827 [stat.ML] (Published 2020-07-14)

Early stopping and polynomial smoothing in regression with reproducing kernels

Yaroslav Averyanov, Alain Celisse

arXiv:2301.11556 [stat.ML] (Published 2023-01-27)

Conformal inference is (almost) free for neural networks trained with early stopping

Ziyi Liang, Yanfei Zhou, Matteo Sesia

arXiv:2310.02581 [stat.ML] (Published 2023-10-04)

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Yichen Zhang, Xi Chen