arXiv:1811.06521 Abstract | arXiv Analytics

arXiv:1811.06521 [cs.LG]Abstract References Reviews Resources

Reward learning from human preferences and demonstrations in Atari

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

Published 2018-11-15Version 1

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

Comments: NIPS 2018

Categories: cs.LG, cs.AI, cs.NE, stat.ML

Keywords: human preferences, demonstrations, reward learning, deep neural network, reward function

Related articles: Most relevant | Search more

arXiv:1905.03381 [cs.LG] (Published 2019-05-08)

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon

arXiv:1905.07777 [cs.LG] (Published 2019-05-19)

A type of generalization error induced by initialization in deep neural networks

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

arXiv:1807.01251 [cs.LG] (Published 2018-07-03)

Training behavior of deep neural network in frequency domain