arXiv Analytics

Sign in

arXiv:2207.02349 [cs.CV]AbstractReferencesReviewsResources

Generalization to translation shifts: a study in architectures and augmentations

Suriya Gunasekar

Published 2022-07-05Version 1

We provide a detailed evaluation of various image classification architectures (convolutional, vision transformer, and fully connected MLP networks) and data augmentation techniques towards generalization to large spacial translation shifts. We make the following observations: (a) In the absence of data augmentation, all architectures, including convolutional networks suffer degradation in performance when evaluated on translated test distributions. Understandably, both the in-distribution accuracy as well as degradation to shifts is significantly worse for non-convolutional architectures. (b) Across all architectures, even a minimal augmentation of $4$ pixel random crop improves the robustness of performance to much larger magnitude shifts of up to $1/4$ of image size ($8$-$16$ pixels) in the test data -- suggesting a form of meta generalization from augmentation. For non-convolutional architectures, while the absolute accuracy is still low, we see dramatic improvements in robustness to large translation shifts. (c) With sufficiently advanced augmentation ($4$ pixel crop+RandAugmentation+Erasing+MixUp) pipeline all architectures can be trained to have competitive performance, both in terms of in-distribution accuracy as well as generalization to large translation shifts.

Related articles: Most relevant | Search more
arXiv:2008.05700 [cs.CV] (Published 2020-08-13)
What leads to generalization of object proposals?
arXiv:2108.03489 [cs.CV] (Published 2021-08-07)
Impact of Aliasing on Generalization in Deep Convolutional Networks
arXiv:2303.01870 [cs.CV] (Published 2023-03-03, updated 2023-10-28)
Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models