skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Fast sparse matrix-vector multiplication by exploiting variable block structure

Technical Report ·
DOI:https://doi.org/10.2172/891708· OSTI ID:891708

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A{sub 1} + A{sub 2} + ... + A{sub s}, where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1x over not blocking at all, and as high as 1.8x over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage. Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A{sub i} will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
891708
Report Number(s):
UCRL-TR-213454; TRN: US200622%%314
Country of Publication:
United States
Language:
English

Similar Records

Tensor Contraction and Operation Minimization forExtreme Scale Computational Chemistry
Technical Report · Wed Feb 17 00:00:00 EST 2021 · OSTI ID:891708

Sparse triangular solves for ILU revisited: data layout crucial to better performance.; .
Journal Article · Tue Nov 01 00:00:00 EDT 2011 · International Journal of High Performance Computing Applications · OSTI ID:891708

Scientific Computing Kernels on the Cell Processor
Journal Article · Wed Apr 04 00:00:00 EDT 2007 · International Journal of Parallel Programming · OSTI ID:891708