arXiv:2203.01572 Abstract | arXiv Analytics

arXiv:2203.01572 [cs.LG]Abstract References Reviews Resources

Data Augmentation as Feature Manipulation: a story of desert cows and grass cows

Ruoqi Shen, Sébastien Bubeck, Suriya Gunasekar

Published 2022-03-03Version 1

Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view model by Allen-Zhu and Li [2020]. We complement this analysis with further experimental evidence that data augmentation can be viewed as a form of feature manipulation.

Comments: 37 pages, 4 figures

Categories: cs.LG, stat.ML

Keywords: data augmentation, feature manipulation, desert cows, grass cows, layer convolutional neural network

Related articles: Most relevant | Search more

arXiv:2203.16481 [cs.LG] (Published 2022-03-30)

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson

arXiv:2203.03304 [cs.LG] (Published 2022-03-07)

Regularising for invariance to data augmentation improves supervised learning

Aleksander Botev, Matthias Bauer, Soham De

arXiv:1904.09135 [cs.LG] (Published 2019-04-19)

Data Augmentation Using GANs