SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems

Krishnan, Manoj Kumar; Nieplocha, Jarek

doi:10.1109/IPDPS.2004.1303000

SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems

Conference · Fri Apr 30 00:00:00 EDT 2004

DOI:https://doi.org/10.1109/IPDPS.2004.1303000· OSTI ID:914703

Krishnan, Manoj Kumar; Nieplocha, Jarek

This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of the Cannon’s algorithm. It is suitable for clusters and shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over ScaLAPACK pdgemm, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI Altix, the new algorithm performs 20 times better than ScaLAPACK for a matrix size of 1000 on 128 processors. The impact of zero-copy nonblocking RMA communications and shared memory communication on matrix multiplication performance on clusters are investigated.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Pacific Northwest National Laboratory (PNNL), Richland, WA (US)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 914703

Report Number(s):: PNNL-SA-45376

Country of Publication:: United States

Language:: English

Similar Records

Scaling Linear Algebra Kernels using Remote Memory Access

Conference · Mon Sep 13 00:00:00 EDT 2010 · OSTI ID:994036

Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

Conference · Fri Apr 08 00:00:00 EDT 2005 · OSTI ID:914702

Revealing the performance of MPI RMA implementations.

Conference · Sun Dec 31 23:00:00 EST 2006 · Lect. Notes Comput. Sci. · OSTI ID:973468

Related Subjects

97 MATHEMATICS AND COMPUTING
99 GENERAL AND MISCELLANEOUS
ALGORITHMS
EFFICIENCY
MEMORY MANAGEMENT
PERFORMANCE
S CODES

SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems

Citation Formats

Similar Records

Related Subjects