arXiv:2106.10075 Abstract | arXiv Analytics

arXiv:2106.10075 [cs.LG]Abstract References Reviews Resources

Learning to Plan via a Multi-Step Policy Regression Method

Stefan Wagner, Michael Janschek, Tobias Uelwer, Stefan Harmeling

Published 2021-06-18Version 1

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

Comments: Accepted at the 30th International Conference on Artificial Neural Networks (ICANN 2021)

Categories: cs.LG, cs.AI, cs.RO

Keywords: multi-step policy regression method, environment, dimensional policy vector, policy distillation setup, increase inference performance

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1907.01285 [cs.LG] (Published 2019-07-02)

Learning the Arrow of Time

Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio

arXiv:1706.09520 [cs.LG] (Published 2017-06-29)

Neural SLAM

Jingwei Zhang, Lei Tai, Joschka Boedecker, Wolfram Burgard, Ming Liu

arXiv:1911.03731 [cs.LG] (Published 2019-11-09)

Learning Internal Representations

Jonathan Baxter

arXiv Analytics

arXiv:2106.10075 [cs.LG]Abstract References Reviews Resources

Learning to Plan via a Multi-Step Policy Regression Method

Links

Toolbox

arXiv:2106.10075 [cs.LG]AbstractReferencesReviewsResources

Learning to Plan via a Multi-Step Policy Regression Method

Links

Toolbox

arXiv:2106.10075 [cs.LG]Abstract References Reviews Resources