Rethinking Spatial Dimensions of Vision Transformers

Pages: 11936 - 11945

Published: Mar 30, 2021

Abstract

Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the successful design principles of CNN, we...

Paper Fields

Paper Details

Title

Rethinking Spatial Dimensions of Vision Transformers

Published Date

Mar 30, 2021

Journal

arXiv (Cornell University)

Pages

11936 - 11945

Citation AnalysisPro

You’ll need to upgrade your plan to Pro

Looking to understand the true influence of a researcher’s work across journals & affiliations?

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

Learn more

Notes

History