arXiv:1810.09103 Abstract | arXiv Analytics

arXiv:1810.09103 [cs.LG]Abstract References Reviews Resources

Actor-Expert: A Framework for using Action-Value Methods in Continuous Action Spaces

Sungsu Lim, Ajin Joseph, Lei Le, Yangchen Pan, Martha White

Published 2018-10-22Version 1

Value-based approaches can be difficult to use in continuous action spaces, because an optimization has to be solved to find the greedy action for the action-values. A common strategy has been to restrict the functional form of the action-values to be convex or quadratic in the actions, to simplify this optimization. Such restrictions, however, can prevent learning accurate action-values. In this work, we propose the Actor-Expert framework for value-based methods, that decouples action-selection (Actor) from the action-value representation (Expert). The Expert uses Q-learning to update the action-values towards the optimal action-values, whereas the Actor (learns to) output the greedy action for the current action-values. We develop a Conditional Cross Entropy Method for the Actor, to learn the greedy action for a generically parameterized Expert, and provide a two-timescale analysis to validate asymptotic behavior. We demonstrate in a toy domain with bimodal action-values that previous restrictive action-value methods fail whereas the decoupled Actor-Expert with a more general action-value parameterization succeeds. Finally, we demonstrate that Actor-Expert performs as well as or better than these other methods on several benchmark continuous-action domains.

Categories: cs.LG, cs.AI, stat.ML

Keywords: continuous action spaces, actor-expert, greedy action, general action-value parameterization succeeds, conditional cross entropy method

Related articles: Most relevant | Search more

arXiv:2201.12332 [cs.LG] (Published 2022-01-28)

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

arXiv:2211.13257 [cs.LG] (Published 2022-11-23)

Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning

Tingting Zhao, Ying Wang, Wei Sun, Yarui Chen, Gang Niub, Masashi Sugiyama

arXiv:2006.12367 [cs.LG] (Published 2020-06-22)

Adaptive Discretization for Adversarial Bandits with Continuous Action Spaces

Chara Podimata, Aleksandrs Slivkins