Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures: Algorithms and Experiments

Deveci, Mehmet; Hammond, Simon David; Wolf, Michael M.; Rajamanickam, Sivasankaran

doi:10.2172/1435688

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures: Algorithms and Experiments

Technical Report · Mon Apr 02 04:00:00 EDT 2018

DOI:https://doi.org/10.2172/1435688· OSTI ID:1435688

Deveci, Mehmet ^[1]; Hammond, Simon David ^[1]; Wolf, Michael M. ^[1]; Rajamanickam, Sivasankaran ^[1]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading highperformance computing architectures — Intel's Knights Landing processor and NVIDIA's Pascal GPU. We describe a data placement method and a chunking-based algorithm for our kernels that exploits the existence of the multiple memory spaces in each hardware platform. We evaluate the performance of these methods w.r.t. standard algorithms using the auto-caching mechanisms Our results show that standard algorithms that exploit cache reuse performed as well as multi-memory-aware algorithms for architectures such as Ki\iLs where the memory subsystems have similar latencies. However, for architectures such as GPUS where memory subsystems differ significantly in both bandwidth and latency, multi-memory-aware methods are crucial for good performance. In addition, our new approaches permit the user to run problems that require larger capacities than the fastest memory of each compute node without depending on the software-managed cache mechanisms.

Research Organization:: Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); USDOE Laboratory Directed Research and Development (LDRD) Program

DOE Contract Number:: AC04-94AL85000; NA0003525

OSTI ID:: 1435688

Report Number(s):: SAND2018--3428R; 662552

Country of Publication:: United States

Language:: English

Similar Records

Resource-aware compression

Patent · Mon Jan 02 23:00:00 EST 2023 · OSTI ID:1986992

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training

Conference · Fri Jun 23 00:00:00 EDT 2023 · OSTI ID:1988131

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Journal Article · Fri Jun 01 00:00:00 EDT 2007 · SIAM Review (SIREV) Journal · OSTI ID:961524

Related Subjects

97 MATHEMATICS AND COMPUTING

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures: Algorithms and Experiments

Citation Formats

Similar Records

Related Subjects