This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.
Original paper

NVIDIA Tensor Core Programmability, Performance & Precision

Pages: 522 - 531
Published: May 1, 2018
Abstract
The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called Tensor Core that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and...
Paper Details
Title
NVIDIA Tensor Core Programmability, Performance & Precision
Published Date
May 1, 2018
Pages
522 - 531
© 2025 Pluto Labs All rights reserved.
Step 1. Scroll down for details & analytics related to the paper.
Discover a range of citation analytics, paper references, a list of cited papers, and more.