In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.
Anzt, Hartwig, et al. "On the performance and energy efficiency of sparse linear algebra on GPUs." International Journal of High Performance Computing Applications, vol. 31, no. 5, Oct. 2016. https://doi.org/10.1177/1094342016672081
Anzt, Hartwig, Tomov, Stanimire, & Dongarra, Jack (2016). On the performance and energy efficiency of sparse linear algebra on GPUs. International Journal of High Performance Computing Applications, 31(5). https://doi.org/10.1177/1094342016672081
Anzt, Hartwig, Tomov, Stanimire, and Dongarra, Jack, "On the performance and energy efficiency of sparse linear algebra on GPUs," International Journal of High Performance Computing Applications 31, no. 5 (2016), https://doi.org/10.1177/1094342016672081
@article{osti_1437692,
author = {Anzt, Hartwig and Tomov, Stanimire and Dongarra, Jack},
title = {On the performance and energy efficiency of sparse linear algebra on GPUs},
annote = {In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.},
doi = {10.1177/1094342016672081},
url = {https://www.osti.gov/biblio/1437692},
journal = {International Journal of High Performance Computing Applications},
issn = {ISSN 1094-3420},
number = {5},
volume = {31},
place = {United States},
publisher = {SAGE Publications},
year = {2016},
month = {10}}
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 5 Vol. 31; ISSN 1094-3420
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11https://doi.org/10.1145/2063384.2063482
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15https://doi.org/10.1145/2712386.2712387