A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Abstract
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using the compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup overmore »
- Authors:
-
- Michigan State Univ., East Lansing, MI (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
- Iowa State Univ., Ames, IA (United States). Dept. of Physics and Astronomy
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Nuclear Physics (NP); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- OSTI Identifier:
- 1379875
- Grant/Contract Number:
- AC02-05CH11231; SC0008485; FG02-87ER40371
- Resource Type:
- Accepted Manuscript
- Journal Name:
- IEEE Transactions on Parallel and Distributed Systems
- Additional Journal Information:
- Journal Volume: 28; Journal Issue: 6; Journal ID: ISSN 1045-9219
- Publisher:
- IEEE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Sparse matrix multiplication; block eigensolver; configuration interaction; extended roofline model; tall-skinny matrices
Citation Formats
Aktulga, Hasan Metin, Afibuzzaman, Md., Williams, Samuel, Buluc, Aydin, Shao, Meiyue, Yang, Chao, Ng, Esmond G., Maris, Pieter, and Vary, James P. A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations. United States: N. p., 2017.
Web. doi:10.1109/TPDS.2016.2630699.
Aktulga, Hasan Metin, Afibuzzaman, Md., Williams, Samuel, Buluc, Aydin, Shao, Meiyue, Yang, Chao, Ng, Esmond G., Maris, Pieter, & Vary, James P. A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations. United States. https://doi.org/10.1109/TPDS.2016.2630699
Aktulga, Hasan Metin, Afibuzzaman, Md., Williams, Samuel, Buluc, Aydin, Shao, Meiyue, Yang, Chao, Ng, Esmond G., Maris, Pieter, and Vary, James P. Thu .
"A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations". United States. https://doi.org/10.1109/TPDS.2016.2630699. https://www.osti.gov/servlets/purl/1379875.
@article{osti_1379875,
title = {A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations},
author = {Aktulga, Hasan Metin and Afibuzzaman, Md. and Williams, Samuel and Buluc, Aydin and Shao, Meiyue and Yang, Chao and Ng, Esmond G. and Maris, Pieter and Vary, James P.},
abstractNote = {As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using the compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.},
doi = {10.1109/TPDS.2016.2630699},
journal = {IEEE Transactions on Parallel and Distributed Systems},
number = 6,
volume = 28,
place = {United States},
year = {2017},
month = {6}
}
Web of Science