Fast sparse matrix-vector multiplication by exploiting variable block structure

Vuduc, R W; Moon, H

doi:10.2172/891708

Title: Fast sparse matrix-vector multiplication by exploiting variable block structure

Technical Report · Thu Jul 07 00:00:00 EDT 2005

DOI:https://doi.org/10.2172/891708· OSTI ID:891708

Vuduc, R W; Moon, H

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A{sub 1} + A{sub 2} + ... + A{sub s}, where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1x over not blocking at all, and as high as 1.8x over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage. Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A{sub i} will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 891708

Report Number(s):: UCRL-TR-213454; TRN: US200622%%314

Country of Publication:: United States

Language:: English

Similar Records

Tensor Contraction and Operation Minimization forExtreme Scale Computational Chemistry

Technical Report · Wed Feb 17 00:00:00 EST 2021 · OSTI ID:891708

Sabin, Gerald; Sadayappan, P.

Sparse triangular solves for ILU revisited: data layout crucial to better performance.; .

Journal Article · Tue Nov 01 00:00:00 EDT 2011 · International Journal of High Performance Computing Applications · OSTI ID:891708

Smith, B; Zhang, H

Scientific Computing Kernels on the Cell Processor

Journal Article · Wed Apr 04 00:00:00 EDT 2007 · International Journal of Parallel Programming · OSTI ID:891708

Williams, Samuel W; Shalf, John; Oliker, Leonid; +3 more

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ALIGNMENT
IMPLEMENTATION
MATRICES
PERFORMANCE
SIMULATION
STORAGE
TUNING

Title: Fast sparse matrix-vector multiplication by exploiting variable block structure

Citation Formats

Similar Records

Related Subjects