arXiv:2010.11171 Abstract | arXiv Analytics

arXiv:2010.11171 [cs.LG]Abstract References Reviews Resources

Data augmentation as stochastic optimization

Published 2020-10-21Version 1

We present a theoretical framework recasting data augmentation as stochastic optimization for a sequence of time-varying proxy losses. This provides a unified approach to understanding techniques commonly thought of as data augmentation, including synthetic noise and label-preserving transformations, as well as more traditional ideas in stochastic optimization such as learning rate and batch size scheduling. We prove a time-varying Monro-Robbins theorem with rates of convergence which gives conditions on the learning rate and augmentation schedule under which augmented gradient descent converges. Special cases give provably good joint schedules for augmentation with additive noise, minibatch SGD, and minibatch SGD with noise.

Comments: 25 pages

Categories: cs.LG, math.OC, stat.ML

Keywords: stochastic optimization, minibatch sgd, augmented gradient descent converges, theoretical framework recasting data augmentation, learning rate

Related articles: Most relevant | Search more

arXiv:2107.08686 [cs.LG] (Published 2021-07-19)

Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints

Shaojie Li, Yong Liu

arXiv:2007.00878 [cs.LG] (Published 2020-07-02)

On the Outsized Importance of Learning Rates in Local Update Methods

Zachary Charles, Jakub Konečný

arXiv:2003.02389 [cs.LG] (Published 2020-03-05)

Comparing Rewinding and Fine-tuning in Neural Network Pruning

Alex Renda, Jonathan Frankle, Michael Carbin

arXiv Analytics

arXiv:2010.11171 [cs.LG]Abstract References Reviews Resources

Data augmentation as stochastic optimization

Links

Toolbox

arXiv:2010.11171 [cs.LG]AbstractReferencesReviewsResources

Data augmentation as stochastic optimization

Links

Toolbox

arXiv:2010.11171 [cs.LG]Abstract References Reviews Resources