arXiv:1907.05634 Abstract | arXiv Analytics

arXiv:1907.05634 [cs.LG]Abstract References Reviews Resources

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Published 2019-07-12Version 1

Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently. However, learning from demonstrations often suffers from the covariate shift problem, which results in cascading errors of the learned policy. We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies with self-correction. We design an algorithm Value Iteration with Negative Sampling (VINS) that practically learns such value functions with conservative extrapolation. We show that VINS can correct mistakes of the behavioral cloning policy on simulated robotics benchmark tasks. We also propose the algorithm of using VINS to initialize a reinforcement learning algorithm, which is shown to outperform significantly prior works in sample efficiency.

Categories: cs.LG, cs.AI, stat.ML

Keywords: value functions, learning self-correctable policies, negative sampling, demonstrations, reinforcement learning algorithm

Related articles: Most relevant | Search more

arXiv:2009.14108 [cs.LG] (Published 2020-09-29)

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Vihang P. Patil et al.

arXiv:1910.02760 [cs.LG] (Published 2019-10-07)

Negative Sampling in Variational Autoencoders

Adrián Csiszárik, Beatrix Benkő, Dániel Varga

arXiv:2406.14951 [cs.LG] (Published 2024-06-21)

An Idiosyncrasy of Time-discretization in Reinforcement Learning

Kris De Asis, Richard S. Sutton

arXiv Analytics

arXiv:1907.05634 [cs.LG]Abstract References Reviews Resources

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Links

Toolbox

arXiv:1907.05634 [cs.LG]AbstractReferencesReviewsResources

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Links

Toolbox

arXiv:1907.05634 [cs.LG]Abstract References Reviews Resources