{ "id": "1806.10745", "version": "v1", "published": "2018-06-28T02:50:38.000Z", "updated": "2018-06-28T02:50:38.000Z", "title": "Contextual bandits with surrogate losses: Margin bounds and efficient algorithms", "authors": [ "Dylan J. Foster", "Akshay Krishnamurthy" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "We introduce a new family of margin-based regret guarantees for adversarial contextual bandit learning. Our results are based on multiclass surrogate losses. Using the ramp loss, we derive a universal margin-based regret bound in terms of the sequential metric entropy for a benchmark class of real-valued regression functions. The new margin bound serves as a complete contextual bandit analogue of the classical margin bound from statistical learning. The result applies to large nonparametric classes, improving on the best known results for Lipschitz contextual bandits (Cesa-Bianchi et al., 2017) and, as a special case, generalizes the dimension-independent Banditron regret bound (Kakade et al., 2008) to arbitrary linear classes with smooth norms. On the algorithmic side, we use the hinge loss to derive an efficient algorithm with a $\\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regression functions. This provides the first hinge loss-based solution to the open problem of Abernethy and Rakhlin (2009). With an additional i.i.d. assumption we give a simple oracle-efficient algorithm whose regret matches our generic metric entropy-based bound for sufficiently complex nonparametric classes. Under realizability assumptions our results also yield classical regret bounds.", "revisions": [ { "version": "v1", "updated": "2018-06-28T02:50:38.000Z" } ], "analyses": { "keywords": [ "efficient algorithm", "surrogate losses", "regression functions", "dimension-independent banditron regret bound", "complete contextual bandit analogue" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }