arXiv:1809.06098 Abstract | arXiv Analytics

arXiv:1809.06098 [cs.LG]Abstract References Reviews Resources

Policy Optimization via Importance Sampling

Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

Published 2018-09-17Version 1

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating on-line and off-line optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel model-free policy search algorithm, POIS, applicable in both control-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation and then we define a surrogate objective function which is optimized off-line using a batch of trajectories. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with the state-of-the-art policy optimization methods.

Categories: cs.LG, cs.AI, stat.ML

Keywords: importance sampling, continuous control tasks, novel model-free policy search algorithm, state-of-the-art policy optimization methods, objective function

Related articles: Most relevant | Search more

arXiv:2002.05954 [cs.LG] (Published 2020-02-14)

Learning Functionally Decomposed Hierarchies for Continuous Control Tasks

Lukas Jendele, Sammy Christen, Emre Aksan, Otmar Hilliges

arXiv:1910.13181 [cs.LG] (Published 2019-10-29)

Bridging the ELBO and MMD

Talip Ucar

arXiv:2501.13296 [cs.LG] (Published 2025-01-23)

Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

Takuro Kutsuna

arXiv Analytics

arXiv:1809.06098 [cs.LG]Abstract References Reviews Resources

Policy Optimization via Importance Sampling

Links

Toolbox

arXiv:1809.06098 [cs.LG]AbstractReferencesReviewsResources

Policy Optimization via Importance Sampling

Links

Toolbox

arXiv:1809.06098 [cs.LG]Abstract References Reviews Resources