arXiv:2106.15788 Abstract | arXiv Analytics

arXiv:2106.15788 [cs.CV]Abstract References Reviews Resources

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment

Di Wu, Siyuan Li, Zelin Zang, Kai Wang, Lei Shang, Baigui Sun, Hao Li, Stan Z. Li

Published 2021-06-30Version 1

Self-supervised contrastive learning has demonstrated great potential in learning visual representations. Despite their success on various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. In this paper, we first point out that current contrastive methods are prone to memorizing background/foreground texture and therefore have a limitation in localizing the foreground object. Analysis suggests that learning to extract discriminative texture information and localization are equally crucial for self-supervised pre-training under fine-grained scenarios. Based on our findings, we introduce Cross-view Saliency Alignment (CVSA), a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on the foreground object via a cross-view alignment loss. Extensive experiments on four popular fine-grained classification benchmarks show that CVSA significantly improves the learned representation.

Categories: cs.CV

Keywords: self-supervised pre-training, fine-grained recognition, foreground object, extract discriminative texture information, fine-grained scenarios

Related articles: Most relevant | Search more

arXiv:1703.09983 [cs.CV] (Published 2017-03-29)

Iterative Object and Part Transfer for Fine-Grained Recognition

Zhiqiang Shen, Yu-Gang Jiang, Dequan Wang, Xiangyang Xue

arXiv:1811.10770 [cs.CV] (Published 2018-11-27)

Generating Attention from Classifier Activations for Fine-grained Recognition

Wei Shen, Rujie Liu

arXiv:1909.08950 [cs.CV] (Published 2019-09-19)

Count, Crop and Recognise: Fine-Grained Recognition in the Wild

Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

arXiv Analytics

arXiv:2106.15788 [cs.CV]Abstract References Reviews Resources

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment

Links

Toolbox

arXiv:2106.15788 [cs.CV]AbstractReferencesReviewsResources

Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment

Links

Toolbox

arXiv:2106.15788 [cs.CV]Abstract References Reviews Resources