Fast implementation of DGEMM on Fermi GPU

Guangming Tan; Linchuan Li; Sean Triechle; Everett Phillips; Yungang Bao; Ninghui Sun

doi:https://doi.org/10.1145/2063384.2063431

doi.org/10.1145/2063384.2063431

Original paper

Fast implementation of DGEMM on Fermi GPU

,

,

..., Ninghui Sun

16

Pages: 1 - 11

Published: Nov 8, 2011

Abstract

In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based on micro-architecture benchmarks. Our optimizations include software pipelining, use of...

Paper Fields

Paper Details

Title

Fast implementation of DGEMM on Fermi GPU

DOI

doi.org/10.1145/2063384.2063431

Published Date

Nov 8, 2011

Pages

1 - 11

Notes

To use the Note feature, you need to be logged in. Please

History