Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor

Lijuan Jiang; Chao Yang; Wenjing Ma

doi:https://doi.org/10.1145/3378176

doi.org/10.1145/3378176

Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor

,

,

ACM Transactions on Architecture and Code Optimization1.60

Volume: 17, Issue: 1, Pages: 1 - 23

Published: Mar 4, 2020

Abstract

We present a systematic methodology for optimizing batched matrix multiplications on SW26010 many-core processor of the Sunway TaihuLight supercomputer. Five surrogate algorithms and a machine learning–based algorithm selector are proposed to fully exploit the computing capability of SW26010 and cope with the sophisticated algorithm characteristics of batched matrix multiplications. Experiment results show that the algorithm selector is able to...

Paper Fields

Paper Details

Title

Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor

DOI

doi.org/10.1145/3378176

Published Date

Mar 4, 2020

Journal

ACM Transactions on Architecture and Code Optimization

Volume

17

Issue

1

Pages

1 - 23

Citation AnalysisPro

You’ll need to upgrade your plan to Pro

Looking to understand the true influence of a researcher’s work across journals & affiliations?

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

Learn more

Notes

History