arXiv:1602.05394 Abstract | arXiv Analytics

arXiv:1602.05394 [stat.ML]Abstract References Reviews Resources

Online optimization and regret guarantees for non-additive long-term constraints

Rodolphe Jenatton, Jim Huang, Cedric Archambeau

Published 2016-02-17Version 1

We consider online optimization in the 1-lookahead setting, where the objective does not decompose additively over the rounds of the online game. The resulting formulation enables us to deal with non-stationary and/or long-term constraints , which arise, for example, in online display advertising problems. We propose an on-line primal-dual algorithm for which we obtain dynamic cumulative regret guarantees. They depend on the convexity and the smoothness of the non-additive penalty, as well as terms capturing the smoothness with which the residuals of the non-stationary and long-term constraints vary over the rounds. We conduct experiments on synthetic data to illustrate the benefits of the non-additive penalty and show vanishing regret convergence on live traffic data collected by a display advertising platform in production.

Categories: stat.ML, cs.LG, math.OC, math.ST, stat.TH

Keywords: non-additive long-term constraints, online optimization, online display advertising problems, on-line primal-dual algorithm, dynamic cumulative regret guarantees

Related articles:

arXiv:1307.5944 [stat.ML] (Published 2013-07-23, updated 2016-01-19)

Online Optimization in Dynamic Environments

Eric C. Hall, Rebecca M. Willett

arXiv:1406.3190 [stat.ML] (Published 2014-06-12, updated 2014-12-14)

Online Optimization for Max-Norm Regularization

Jie Shen, Huan Xu, Ping Li

arXiv:2311.10023 [stat.ML] (Published 2023-11-16)

Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia