skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication

Authors:
; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1332463
Report Number(s):
LLNL-CONF-695657
DOE Contract Number:
AC52-07NA27344
Resource Type:
Conference
Resource Relation:
Conference: Presented at: 28th International Symposium on Computer Architecture and High Performance Computing, Los Angeles, CA, United States, Oct 26 - Oct 28, 2016
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Citation Formats

Monteiro, S, Wong, D, and Iandola, F. STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication. United States: N. p., 2016. Web.
Monteiro, S, Wong, D, & Iandola, F. STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication. United States.
Monteiro, S, Wong, D, and Iandola, F. Wed . "STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication". United States. doi:. https://www.osti.gov/servlets/purl/1332463.
@article{osti_1332463,
title = {STOMP: Statistical Techniques for Optimizing and Modeling Performance of blocked sparse matrix vector multiplication},
author = {Monteiro, S and Wong, D and Iandola, F},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jun 22 00:00:00 EDT 2016},
month = {Wed Jun 22 00:00:00 EDT 2016}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • No abstract prepared.
  • Obtaining highly accurate predictions on the properties of light atomic nuclei using the configuration interaction (CI) approach requires computing a few extremal Eigen pairs of the many-body nuclear Hamiltonian matrix. In the Many-body Fermion Dynamics for nuclei (MFDn) code, a block Eigen solver is used for this purpose. Due to the large size of the sparse matrices involved, a significant fraction of the time spent on the Eigen value computations is associated with the multiplication of a sparse matrix (and the transpose of that matrix) with multiple vectors (SpMM and SpMM-T). Existing implementations of SpMM and SpMM-T significantly underperform expectations.more » Thus, in this paper, we present and analyze optimized implementations of SpMM and SpMM-T. We base our implementation on the compressed sparse blocks (CSB) matrix format and target systems with multi-core architectures. We develop a performance model that allows us to understand and estimate the performance characteristics of our SpMM kernel implementations, and demonstrate the efficiency of our implementation on a series of real-world matrices extracted from MFDn. In particular, we obtain 3-4 speedup on the requisite operations over good implementations based on the commonly used compressed sparse row (CSR) matrix format. The improvements in the SpMM kernel suggest we may attain roughly a 40% speed up in the overall execution time of the block Eigen solver used in MFDn.« less
  • Abstract not provided.
  • Many iterative schemes in scientific applications require the multiplication of a sparse matrix by a vector. This kernel has been mainly studied on vector processors and shared-memory parallel computers. In this paper, we address the implementation issues when using a shared virtual memory system on a distributed memory parallel computer. We study in details the impact of loop distribution schemes in order to design an efficient algorithm.
  • We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV)--one of the most heavily used kernels in scientific computing--across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, as well as the highly multithreaded Sun Niagara and heterogeneous STI Cell. We presentmore » several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less