Threaded Multi-Core GEMM with MoA and Cache-Blocking: Preprint

Thomas, Stephen; Mullin, Lenore; Swirydowicz, Kasia; Khan, Rishi

Threaded Multi-Core GEMM with MoA and Cache-Blocking: Preprint

Conference · Mon Feb 28 23:00:00 EST 2022

OSTI ID:1848079

Thomas, Stephen; Mullin, Lenore; Swirydowicz, Kasia; Khan, Rishi

A threaded multi-core implementation of the high performance dense linear algebra matrix-matrix multiply GEMM kernel is described. This kernel is widely implemented by vendors in the basic linear algebra subroutine BLAS library. The mathematics of arrays (MoA) paradigm due to Mullin (1988) results in contiguous memory accesses by employing outer-product forms. Our performance studies demonstrate that the MoA implementation of double precision DGEMM combined with optimal cache-blocking strategies results in at least a 25% performance gain on the Intel Xeon Skylake processor over the vendor supplied Intel MKL basic linear algebra libraries. Results are presented for the NREL Eagle supercomputer. The multi-core DGEMM achieves over 100 GigaFlops/sec with eight openMP threads.

Research Organization:: National Renewable Energy Laboratory (NREL), Golden, CO (United States)

Sponsoring Organization:: USDOE Office of Science (SC); USDOE National Nuclear Security Administration (NNSA); Exascale Computing Project; USDOE Office of Energy Efficiency and Renewable Energy (EERE)

DOE Contract Number:: AC36-08GO28308

OSTI ID:: 1848079

Report Number(s):: NREL/CP-2C00-80530; MainId:43732; UUID:439f32b8-b062-4914-8cea-5a5aaae4f83f; MainAdminID:63971

Country of Publication:: United States

Language:: English

Similar Records

Improving the Performance of DGEMM with MoA and Cache-Blocking: Preprint

Conference · Tue Feb 08 23:00:00 EST 2022 · OSTI ID:1845269

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA Tesla GPU Cluster

Conference · Mon Aug 31 00:00:00 EDT 2009 · OSTI ID:965387

Portable high performance GEMM-based level 3 BLAS

Conference · Thu Dec 30 23:00:00 EST 1993 · OSTI ID:54425

Related Subjects

MATHEMATICS AND COMPUTING
cache-blocking
contiguous memory
mathematics of arrays
shared-memory multi-threading

Threaded Multi-Core GEMM with MoA and Cache-Blocking: Preprint

Citation Formats

Similar Records

Related Subjects