This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.
Original paper

Fast implementation of DGEMM on Fermi GPU

Pages: 1 - 11
Published: Nov 8, 2011
Abstract
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of...
Paper Details
Title
Fast implementation of DGEMM on Fermi GPU
Published Date
Nov 8, 2011
Pages
1 - 11
© 2025 Pluto Labs All rights reserved.
Step 1. Scroll down for details & analytics related to the paper.
Discover a range of citation analytics, paper references, a list of cited papers, and more.