arXiv:2301.11270 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords human feedback, principled reinforcement learning, wise comparison, true mle, first sample complexity bound Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset