arXiv:2202.03481 Abstract | arXiv Analytics

arXiv:2202.03481 [cs.LG]Abstract References Reviews Resources

A Ranking Game for Imitation Learning

Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

Published 2022-02-07Version 1

We propose a new framework for imitation learning - treating imitation as a two-player ranking-based Stackelberg game between a $\textit{policy}$ and a $\textit{reward}$ function. In this game, the reward agent learns to satisfy pairwise performance rankings within a set of policies, while the policy agent learns to maximize this reward. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. The Stackelberg game formulation allows us to use optimization methods that take the game structure into account, leading to more sample efficient and stable learning dynamics compared to existing IRL methods. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium. We use insights from this analysis to further increase sample efficiency of the ranking game by using automatically generated rankings or with offline annotated rankings. Our experiments show that the proposed method achieves state-of-the-art sample efficiency and is able to solve previously unsolvable tasks in the Learning from Observation (LfO) setting.

Categories: cs.LG, cs.AI, cs.RO

Keywords: imitation learning, ranking game, method achieves state-of-the-art sample efficiency, satisfy pairwise performance rankings, policy agent learns

Related articles: Most relevant | Search more

arXiv:2309.02473 [cs.LG] (Published 2023-09-05)

A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges

Maryam Zare, Parham M. Kebria, Abbas Khosravi, Saeid Nahavandi

arXiv:2206.04873 [cs.LG] (Published 2022-06-10)

Imitation Learning via Differentiable Physics

Siwei Chen, Xiao Ma, Zhongwen Xu

arXiv:1206.5290 [cs.LG] (Published 2012-06-20)

Imitation Learning with a Value-Based Prior

Umar Syed, Robert E. Schapire