arXiv:2207.02349 Abstract | arXiv Analytics

arXiv:2207.02349 [cs.CV]Abstract References Reviews Resources

Generalization to translation shifts: a study in architectures and augmentations

Published 2022-07-05Version 1

We provide a detailed evaluation of various image classification architectures (convolutional, vision transformer, and fully connected MLP networks) and data augmentation techniques towards generalization to large spacial translation shifts. We make the following observations: (a) In the absence of data augmentation, all architectures, including convolutional networks suffer degradation in performance when evaluated on translated test distributions. Understandably, both the in-distribution accuracy as well as degradation to shifts is significantly worse for non-convolutional architectures. (b) Across all architectures, even a minimal augmentation of $4$ pixel random crop improves the robustness of performance to much larger magnitude shifts of up to $1/4$ of image size ($8$-$16$ pixels) in the test data -- suggesting a form of meta generalization from augmentation. For non-convolutional architectures, while the absolute accuracy is still low, we see dramatic improvements in robustness to large translation shifts. (c) With sufficiently advanced augmentation ($4$ pixel crop+RandAugmentation+Erasing+MixUp) pipeline all architectures can be trained to have competitive performance, both in terms of in-distribution accuracy as well as generalization to large translation shifts.

Categories: cs.CV, cs.LG

Keywords: generalization, large translation shifts, in-distribution accuracy, non-convolutional architectures, convolutional networks suffer degradation

Related articles: Most relevant | Search more

arXiv:2008.05700 [cs.CV] (Published 2020-08-13)

What leads to generalization of object proposals?

Rui Wang, Dhruv Mahajan, Vignesh Ramanathan

arXiv:2108.03489 [cs.CV] (Published 2021-08-07)

Impact of Aliasing on Generalization in Deep Convolutional Networks

Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Rob Romijnders, Nicolas Le Roux, Ross Goroshin

arXiv:2303.01870 [cs.CV] (Published 2023-03-03, updated 2023-10-28)

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

Naman D Singh, Francesco Croce, Matthias Hein