arXiv Analytics

Sign in

arXiv:1811.06521 [cs.LG]AbstractReferencesReviewsResources

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

Published 2018-11-15Version 1

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Related articles: Most relevant | Search more
arXiv:1905.03381 [cs.LG] (Published 2019-05-08)
AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
arXiv:1905.07777 [cs.LG] (Published 2019-05-19)
A type of generalization error induced by initialization in deep neural networks
arXiv:1807.01251 [cs.LG] (Published 2018-07-03)
Training behavior of deep neural network in frequency domain