arXiv Analytics

Sign in

arXiv:1809.06098 [cs.LG]AbstractReferencesReviewsResources

Policy Optimization via Importance Sampling

Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

Published 2018-09-17Version 1

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating on-line and off-line optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel model-free policy search algorithm, POIS, applicable in both control-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation and then we define a surrogate objective function which is optimized off-line using a batch of trajectories. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with the state-of-the-art policy optimization methods.

Related articles: Most relevant | Search more
arXiv:2002.05954 [cs.LG] (Published 2020-02-14)
Learning Functionally Decomposed Hierarchies for Continuous Control Tasks
arXiv:1910.13181 [cs.LG] (Published 2019-10-29)
Bridging the ELBO and MMD
arXiv:2501.13296 [cs.LG] (Published 2025-01-23)
Exploring Variance Reduction in Importance Sampling for Efficient DNN Training