arXiv:1208.0984 Abstract | arXiv Analytics

arXiv:1208.0984 [cs.LG]Abstract References Reviews Resources

APRIL: Active Preference-learning based Reinforcement Learning

Riad Akrour, Marc Schoenauer, Michèle Sebag

Published 2012-08-05Version 1

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy.

Journal: ECML PKDD 2012 7524 (2012) 116-131

Categories: cs.LG

Keywords: reinforcement learning, approximate policy return, active preference-learning, cancer treatment testbeds witness, achieve direct policy search

Tags: journal article

Related articles: Most relevant | Search more

arXiv:1706.04711 [cs.LG] (Published 2017-06-15)

Reinforcement Learning under Model Mismatch

Aurko Roy, Huan Xu, Sebastian Pokutta

arXiv:1811.01483 [cs.LG] (Published 2018-11-05)

Contingency-Aware Exploration in Reinforcement Learning

Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee

arXiv:1301.0601 [cs.LG] (Published 2012-12-12)