skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures

Journal Article · · Parallel Computing
 [1];  [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sparse matrix-matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this work, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000; NA-0003525
OSTI ID:
1466997
Alternate ID(s):
OSTI ID: 1548025
Report Number(s):
SAND2017-13679J; 659607
Journal Information:
Parallel Computing, Vol. 78, Issue C; ISSN 0167-8191
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 24 works
Citation information provided by
Web of Science

References (10)

Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos journal December 2014
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition journal September 1978
GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging journal January 2015
Optimizing Sparse Matrix—Matrix Multiplication for the GPU journal October 2015
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
The Combinatorial BLAS: design, implementation, and applications journal May 2011
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication journal January 2016
Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication journal January 2014
Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures journal August 2017
Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication conference January 2015

Cited By (3)

Adaptive sparse matrix-matrix multiplication on the GPU
  • Winter, Martin; Mlakar, Daniel; Zayer, Rhaleb
  • PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3293883.3295701
conference February 2019
Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication journal January 2019
Preparing sparse solvers for exascale computing
  • Anzt, Hartwig; Boman, Erik; Falgout, Rob
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 378, Issue 2166 https://doi.org/10.1098/rsta.2019.0053
journal January 2020