A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Journal Article
·
· IEEE Transactions on Parallel and Distributed Systems
- Michigan State Univ., East Lansing, MI (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
- Iowa State Univ., Ames, IA (United States). Dept. of Physics and Astronomy
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using the compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC), Nuclear Physics (NP) (SC-26)
- Grant/Contract Number:
- AC02-05CH11231; FG02-87ER40371; SC0008485
- OSTI ID:
- 1379875
- Journal Information:
- IEEE Transactions on Parallel and Distributed Systems, Journal Name: IEEE Transactions on Parallel and Distributed Systems Journal Issue: 6 Vol. 28; ISSN 1045-9219
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
On the performance and energy efficiency of sparse linear algebra on GPUs
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
Conference
·
Thu Aug 14 00:00:00 EDT 2014
·
OSTI ID:1407214
On the performance and energy efficiency of sparse linear algebra on GPUs
Journal Article
·
Tue Oct 04 20:00:00 EDT 2016
· International Journal of High Performance Computing Applications
·
OSTI ID:1437692
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
Technical Report
·
Thu Dec 31 23:00:00 EST 2015
·
OSTI ID:1237520