arXiv:1911.04384 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords function approximation, provably convergent off-policy actor-critic algorithm, first provably convergent off-policy actor-critic, emphasis critic, temporal difference learning Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset