arXiv:2208.11718 Abstract | arXiv Analytics

arXiv:2208.11718 [cs.CV]Abstract References Reviews Resources

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

Published 2022-08-24Version 1

Following the success in language domain, the self-attention mechanism (transformer) is adopted in the vision domain and achieving great success recently. Additionally, as another stream, multi-layer perceptron (MLP) is also explored in the vision domain. These architectures, other than traditional CNNs, have been attracting attention recently, and many methods have been proposed. As one that combines parameter efficiency and performance with locality and hierarchy in image recognition, we propose gSwin, which merges the two streams; Swin Transformer and (multi-head) gMLP. We showed that our gSwin can achieve better accuracy on three vision tasks, image classification, object detection and semantic segmentation, than Swin Transformer, with smaller model size.

Comments: 13 pages, 6 figures

Categories: cs.CV, cs.LG

Keywords: gated mlp vision model, hierarchical structure, shifted window, vision domain, swin transformer

Related articles: Most relevant | Search more

arXiv:2405.12781 [cs.CV] (Published 2024-05-21)

Self-Supervised Modality-Agnostic Pre-Training of Swin Transformers

Abhiroop Talasila, Maitreya Maity, U. Deva Priyakumar

arXiv:2103.14030 [cs.CV] (Published 2021-03-25)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu et al.

arXiv:2501.15656 [cs.CV] (Published 2025-01-26, updated 2025-01-31)

Classifying Deepfakes Using Swin Transformers

Aprille J. Xi, Eason Chen

arXiv Analytics

arXiv:2208.11718 [cs.CV]Abstract References Reviews Resources

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

Links

Toolbox

arXiv:2208.11718 [cs.CV]AbstractReferencesReviewsResources

gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window

Links

Toolbox

arXiv:2208.11718 [cs.CV]Abstract References Reviews Resources