Rethinking Spatial Dimensions of Vision Transformers
Abstract
Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the successful design principles of CNN, we...
Paper Details
Title
Rethinking Spatial Dimensions of Vision Transformers
Published Date
Mar 30, 2021
Journal
Pages
11936 - 11945
Citation AnalysisPro
You’ll need to upgrade your plan to Pro
Looking to understand the true influence of a researcher’s work across journals & affiliations?
- Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
- Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.
Notes
History