arXiv Analytics

Sign in

arXiv:2010.11171 [cs.LG]AbstractReferencesReviewsResources

Data augmentation as stochastic optimization

Boris Hanin, Yi Sun

Published 2020-10-21Version 1

We present a theoretical framework recasting data augmentation as stochastic optimization for a sequence of time-varying proxy losses. This provides a unified approach to understanding techniques commonly thought of as data augmentation, including synthetic noise and label-preserving transformations, as well as more traditional ideas in stochastic optimization such as learning rate and batch size scheduling. We prove a time-varying Monro-Robbins theorem with rates of convergence which gives conditions on the learning rate and augmentation schedule under which augmented gradient descent converges. Special cases give provably good joint schedules for augmentation with additive noise, minibatch SGD, and minibatch SGD with noise.

Related articles: Most relevant | Search more
arXiv:2107.08686 [cs.LG] (Published 2021-07-19)
Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints
arXiv:2007.00878 [cs.LG] (Published 2020-07-02)
On the Outsized Importance of Learning Rates in Local Update Methods
arXiv:2003.02389 [cs.LG] (Published 2020-03-05)
Comparing Rewinding and Fine-tuning in Neural Network Pruning