A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution

In the past few decades, general matrix multiplication (GEMM), as the basic component of the Basic Linear Algebra Subprograms (BLAS) library, has played a vital role in various fields such as machine learning, image processing, and fluid dynamics. Because these fields tend to deconstruct the problem into multiple smaller sub-problems, today’s BLAS libraries have implemented batched GEMM routines to achieve high performance in this scenario....

Paper Fields

Paper Details

Title

DOI

doi.org/10.1007/s11227-021-03936-9

Published Date

Jun 21, 2021

Journal

The Journal of Supercomputing

Volume

78

Issue

2

Pages

1741 - 1758

Citation AnalysisPro

You’ll need to upgrade your plan to Pro

Looking to understand the true influence of a researcher’s work across journals & affiliations?

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

More information

Notes

History