Rethinking Spatial Dimensions of Vision Transformers

Pages: 11936 - 11945
Published: Mar 30, 2021
Abstract
Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the successful design principles of CNN, we...
Paper Details
Title
Rethinking Spatial Dimensions of Vision Transformers
Published Date
Mar 30, 2021
Pages
11936 - 11945
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.