Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

On the performance and energy efficiency of sparse linear algebra on GPUs

Journal Article · · International Journal of High Performance Computing Applications
 [1];  [1];  [2]
  1. University of Tennessee, Knoxville, USA
  2. University of Tennessee, Knoxville, USA, Oak Ridge National Laboratory, USA, University of Manchester, UK

In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0010042
OSTI ID:
1437692
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 5 Vol. 31; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (32)

Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors: Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors journal September 2014
Locally Optimal Block Preconditioned Conjugate Gradient Method for Hierarchical Matrices journal December 2011
Preconditioned Block-Iterative Methods on GPUs journal December 2012
octopus: a tool for the application of time-dependent density functional theory journal September 2006
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures book January 2010
Evaluating the performance and energy efficiency of the COSMO-ART model system journal July 2014
Evaluating application performance and energy consumption on hybrid CPU+GPU architecture journal June 2012
First-principles computation of material properties: the ABINIT software project journal November 2002
Multilevel preconditioned iterative eigensolvers for Maxwell eigenvalue problems journal July 2005
Basis selection in LOBPCG journal October 2006
State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems journal July 2008
Quantifying the energy cost of data movement in scientific applications conference September 2013
Communication-Avoiding QR Decomposition for GPUs
  • Anderson, Michael; Ballard, Grey; Demmel, James
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.15
conference May 2011
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
  • Agullo, Emmanuel; Augonnet, Cedric; Dongarra, Jack
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.90
conference May 2011
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
  • Yamazaki, Ichitaro; Anzt, Hartwig; Tomov, Stanimire
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.48
conference May 2014
Energy-Efficient Computing for Extreme-Scale Science journal November 2009
16.447 TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator conference January 2005
Trends and techniques for energy efficient architectures conference September 2010
Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in Hypre and PETSc journal January 2007
LAPACK Users' Guide software January 1999
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods book January 1994
A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units journal January 2014
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method journal January 2001
A Block Orthogonalization Procedure with Constant Synchronization Requirements journal January 2002
Gordon Bell finalists I---High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator conference January 2006
Anasazi software for the numerical solution of large-scale eigenvalue problems journal July 2009
Hardware/software co-design for energy-efficient seismic modeling
  • Krueger, Jens; Donofrio, David; Shalf, John
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063482
conference January 2011
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers
  • Anzt, Hartwig; Tomov, Stanimire; Dongarra, Jack
  • Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15 https://doi.org/10.1145/2712386.2712387
conference January 2015
An Improved Magma Gemm For Fermi Graphics Processing Units journal September 2010
The International Exascale Software Project roadmap journal January 2011
Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures journal January 2006
Towards an online-coupled chemistry-climate model: evaluation of trace gases and aerosols in COSMO-ART journal January 2011

Similar Records

A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Journal Article · Wed May 31 20:00:00 EDT 2017 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1379875

Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Conference · Mon Apr 16 00:00:00 EDT 2007 · OSTI ID:920852

Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Conference · Sun Dec 31 23:00:00 EST 2006 · OSTI ID:1407083

Related Subjects