skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Sparse Matrix-Vector Multiplication on Multicore and Accelerators

Abstract

This chapter consolidates recent work on the development of high performance multicore and accelerator-based implementations of sparse matrix-vector multiplication (SpMV). As an object of study, SpMV is an interesting computation for two key reasons. First, it appears widely in applications in scientific and engineering computing, financial and economic modeling, and information retrieval, among others, and is therefore of great practical interest. Secondly, it is both simple to describe but challenging to implement well, since its performance is limited by a variety of factors, including low computational intensity, potentially highly irregular memory access behavior, and a strong input dependence that be known only at run time. Thus, we believe SpMV is both practically important and provides important insights for understanding the algorithmic and implementation principles necessary to making effective use of state-of-the-art systems.

Authors:
 [1];  [2];  [3];  [2];  [1];  [3]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. NVIDIA Research, Santa Clara, CA (United States)
  3. Georgia Inst. of Technology, Atlanta, GA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1407092
DOE Contract Number:
AC02-05CH11231
Resource Type:
Book
Resource Relation:
Journal Volume: 20102756; Related Information: Book Title: Scientific Computing with Multicore and Accelerators
Country of Publication:
United States
Language:
English
Subject:
43 PARTICLE ACCELERATORS

Citation Formats

Williams, Samuel W., Bell, Nathan, Choi, Jee Whan, Garland, Michael, Oliker, Leonid, and Vuduc, Richard. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. United States: N. p., 2010. Web. doi:10.1201/b10376-8.
Williams, Samuel W., Bell, Nathan, Choi, Jee Whan, Garland, Michael, Oliker, Leonid, & Vuduc, Richard. Sparse Matrix-Vector Multiplication on Multicore and Accelerators. United States. doi:10.1201/b10376-8.
Williams, Samuel W., Bell, Nathan, Choi, Jee Whan, Garland, Michael, Oliker, Leonid, and Vuduc, Richard. 2010. "Sparse Matrix-Vector Multiplication on Multicore and Accelerators". United States. doi:10.1201/b10376-8. https://www.osti.gov/servlets/purl/1407092.
@article{osti_1407092,
title = {Sparse Matrix-Vector Multiplication on Multicore and Accelerators},
author = {Williams, Samuel W. and Bell, Nathan and Choi, Jee Whan and Garland, Michael and Oliker, Leonid and Vuduc, Richard},
abstractNote = {This chapter consolidates recent work on the development of high performance multicore and accelerator-based implementations of sparse matrix-vector multiplication (SpMV). As an object of study, SpMV is an interesting computation for two key reasons. First, it appears widely in applications in scientific and engineering computing, financial and economic modeling, and information retrieval, among others, and is therefore of great practical interest. Secondly, it is both simple to describe but challenging to implement well, since its performance is limited by a variety of factors, including low computational intensity, potentially highly irregular memory access behavior, and a strong input dependence that be known only at run time. Thus, we believe SpMV is both practically important and provides important insights for understanding the algorithmic and implementation principles necessary to making effective use of state-of-the-art systems.},
doi = {10.1201/b10376-8},
journal = {},
number = ,
volume = 20102756,
place = {United States},
year = 2010,
month =
}

Book:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this book.

Save / Share:
  • We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV)--one of the most heavily used kernels in scientific computing--across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, as well as the highly multithreaded Sun Niagara and heterogeneous STI Cell. We presentmore » several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
  • We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
  • We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
  • The critical bottlenecks in the implementation of the conjugate gradient algorithm on distributed memory computers are the communication requirements of the sparse matrix-vector multiply and of the vector recurrences. The authors describe the data distribution and communication patterns of five general implementations, whose realizations demonstrate that the cost of communication can be overcome to a much larger extent than is often assumed. Their results also apply to more general settings for matrix-vector products, both sparse and dense.
  • The manycore paradigm shift, and the resulting change in modern computer architectures, has made the development of optimal numerical routines extremely challenging. In this work, we target the development of numerical algorithms and implementations for Xeon Phi coprocessor architecture designs. In particular, we examine and optimize the general and symmetric matrix-vector multiplication routines (gemv/symv), which are some of the most heavily used linear algebra kernels in many important engineering and physics applications. We describe a successful approach on how to address the challenges for this problem, starting with our algorithm design, performance analysis and programing model and moving to kernelmore » optimization. Our goal, by targeting low-level and easy to understand fundamental kernels, is to develop new optimization strategies that can be effective elsewhere for use on manycore coprocessors, and to show significant performance improvements compared to existing state-of-the-art implementations. Therefore, in addition to the new optimization strategies, analysis, and optimal performance results, we finally present the significance of using these routines/strategies to accelerate higher-level numerical algorithms for the eigenvalue problem (EVP) and the singular value decomposition (SVD) that by themselves are foundational for many important applications.« less