arXiv Analytics

Sign in

arXiv:2106.10075 [cs.LG]AbstractReferencesReviewsResources

Learning to Plan via a Multi-Step Policy Regression Method

Stefan Wagner, Michael Janschek, Tobias Uelwer, Stefan Harmeling

Published 2021-06-18Version 1

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

Comments: Accepted at the 30th International Conference on Artificial Neural Networks (ICANN 2021)
Categories: cs.LG, cs.AI, cs.RO
Related articles: Most relevant | Search more
arXiv:1907.01285 [cs.LG] (Published 2019-07-02)
Learning the Arrow of Time
arXiv:1706.09520 [cs.LG] (Published 2017-06-29)
Neural SLAM
arXiv:1911.03731 [cs.LG] (Published 2019-11-09)
Learning Internal Representations