arXiv:1807.09089 Abstract | arXiv Analytics

arXiv:1807.09089 [stat.ML]Abstract References Reviews Resources

Decision Variance in Online Learning

Published 2018-07-24Version 1

Online learning has classically focused on the expected behaviour of learning policies. Recently, risk-averse online learning has gained much attention. In this paper, a risk-averse multi-armed bandit problem where the performance of policies is measured based on the mean-variance of the rewards is studied. The variance of the rewards depends on the variance of the underlying processes as well as the variance of the player's decisions. The performance of two existing policies is analyzed and new fundamental limitations on risk-averse learning is established. In particular, it is shown that although an $\mathcal{O}(\log T)$ distribution-dependent regret in time $T$ is achievable (similar to the risk-neutral setting), the worst-case (i.e. minimax) regret is lower bounded by $\Omega(T)$ (in contrast to the $\Omega(\sqrt{T})$ lower bound in the risk-neutral setting). The lower bound results are even stronger in the sense that they are proven for the case of online learning with full feedback.

Categories: stat.ML, cs.LG

Keywords: online learning, decision variance, lower bound results, risk-averse multi-armed bandit problem, performance

Related articles: Most relevant | Search more

arXiv:1311.0468 [stat.ML] (Published 2013-11-03)

Thompson Sampling for Online Learning with Linear Experts

Aditya Gopalan

arXiv:1810.02567 [stat.ML] (Published 2018-10-05)

Online Learning to Rank with Features

Shuai Li, Tor Lattimore, Csaba Szepesvári

arXiv:1208.3728 [stat.ML] (Published 2012-08-18, updated 2014-05-24)

Online Learning with Predictable Sequences

Alexander Rakhlin, Karthik Sridharan

arXiv Analytics

arXiv:1807.09089 [stat.ML]Abstract References Reviews Resources

Decision Variance in Online Learning

Links

Toolbox

arXiv:1807.09089 [stat.ML]AbstractReferencesReviewsResources

Decision Variance in Online Learning

Links

Toolbox

arXiv:1807.09089 [stat.ML]Abstract References Reviews Resources