arXiv:1504.02089 Abstract | arXiv Analytics

arXiv:1504.02089 [cs.LG]Abstract References Reviews Resources

The Computational Power of Optimization in Online Learning

Published 2015-04-08Version 1

We study the computational relationship between optimization, online learning, and learning in games. We propose an oracle-based model in which the online player is given access to an optimization oracle for the corresponding offline problem, and investigate the impact of such oracle on the computational complexity of the online problem. First, we consider the fundamental setting of prediction with the advice of $N$ experts, augmented with an optimization oracle that can be used to compute, in constant time, the leading expert in retrospect at any point in time. In this setting, we give an algorithm that attains vanishing regret in total runtime of $\widetilde{O}(\sqrt{N})$. We also give a lower bound showing that up to logarithmic factors this running time cannot be improved in the optimization oracle model. These results attest that an optimization oracle gives rise to a quadratic speedup as compared to the standard oracle-free setting, in which the required time for vanishing regret is $\widetilde{\Theta}(N)$. We then consider the closely related problem of learning in repeated $N \times N$ zero-sum games, in a setting where the players have access to oracles that compute the best-response to any mixed strategy of their opponent in constant time. We give an efficient algorithm in this model that converges to the value of the game in total time $\widetilde{O}(\sqrt{N})$, yielding again a quadratic improvement over the state of the art in the oracle-free setting, in which $\widetilde{\Theta}(N)$ is known to be tight (Grigoriadis and Khachiyan, 1995). We then show that this is the best possible, thereby obtaining a tight characterization of the computational power of best-response oracles in zero-sum games.

Categories: cs.LG

Keywords: computational power, online learning, zero-sum games, constant time, vanishing regret

Related articles: Most relevant | Search more

arXiv:2001.06105 [cs.LG] (Published 2020-01-16)

Better Boosting with Bandits for Online Learning

Nikolaos Nikolaou, Joseph Mellor, Nikunj C. Oza, Gavin Brown

arXiv:2012.01705 [cs.LG] (Published 2020-12-03)

Online learning with dynamics: A minimax perspective

Kush Bhatia, Karthik Sridharan

arXiv:2010.10070 [cs.LG] (Published 2020-10-20)

Real-Time Optimisation for Online Learning in Auctions

Lorenzo Croissant, Marc Abeille, Clément Calauzènes