Locality-Aware CTA Clustering for Modern GPUs

Published: Apr 4, 2017
Abstract
Cache is designed to exploit locality; however, the role of on-chip L1 data caches on modern GPUs is often awkward. The locality among global memory requests from different SMs (Streaming Multiprocessors) is predominantly harvested by the commonly-shared L2 with long access latency; while the in-core locality, which is crucial for performance delivery, is handled explicitly by user-controlled scratchpad memory. In this work, we disclose another...
Paper Details
Title
Locality-Aware CTA Clustering for Modern GPUs
Published Date
Apr 4, 2017
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.