arXiv:1406.6812 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords online learning, finite markov decision process, first regret bound, side information vector, optimal dynamic policy Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset