GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems
Abstract
The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of numerical algorithms, among them electronic structure theory in chemistry and in condensed matter physics. Large eigenproblems can easily exceed the capacity of a single compute node, thus must be solved on distributed-memory parallel computers. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. Furthermore, we benchmark the performance of this GPU-accelerated eigensolver on two hybrid CPU–GPU architectures, namely a compute cluster based on Intel Xeon Gold CPUs and NVIDIA Volta GPUs, and the Summit supercomputer based on IBM POWER9 CPUs and NVIDIA Volta GPUs. Consistent with previous benchmarks on CPU-only architectures, the GPU-accelerated two-stage solver exhibits a parallel performance superior to the one-stage counterpart. Finally, we demonstrate the performance of the GPU-accelerated eigensolver developed in this work for routine semi-local KS-DFT calculations comprising thousands of atoms.
- Authors:
-
- Duke Univ., Durham, NC (United States)
- Molecular Sciences Software Inst., Blacksburg, VA (United States)
- Max Planck Computing and Data Facility, Garching (Germany); Czech Academy of Sciences, Prague (Czech Republic). Inst. of Mathematics
- Max Planck Computing and Data Facility, Garching (Germany)
- NVIDIA Switzerland, Zurich (Switzerland)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Nanophase Materials Sciences (CNMS)
- Publication Date:
- Research Org.:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1773653
- Alternate Identifier(s):
- OSTI ID: 1775683
- Grant/Contract Number:
- AC05-00OR22725
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- Computer Physics Communications
- Additional Journal Information:
- Journal Volume: 262; Journal ID: ISSN 0010-4655
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Eigensolver; dense linear algebra; parallel computing; high-performance computing; GPU; CUDA
Citation Formats
Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, and Blum, Volker. GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems. United States: N. p., 2020.
Web. doi:10.1016/j.cpc.2020.107808.
Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, & Blum, Volker. GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems. United States. https://doi.org/10.1016/j.cpc.2020.107808
Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, and Blum, Volker. 2020.
"GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems". United States. https://doi.org/10.1016/j.cpc.2020.107808. https://www.osti.gov/servlets/purl/1773653.
@article{osti_1773653,
title = {GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems},
author = {Yu, Victor Wen-zhe and Moussa, Jonathan and Kůs, Pavel and Marek, Andreas and Messmer, Peter and Yoon, Mina and Lederer, Hermann and Blum, Volker},
abstractNote = {The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of numerical algorithms, among them electronic structure theory in chemistry and in condensed matter physics. Large eigenproblems can easily exceed the capacity of a single compute node, thus must be solved on distributed-memory parallel computers. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. Furthermore, we benchmark the performance of this GPU-accelerated eigensolver on two hybrid CPU–GPU architectures, namely a compute cluster based on Intel Xeon Gold CPUs and NVIDIA Volta GPUs, and the Summit supercomputer based on IBM POWER9 CPUs and NVIDIA Volta GPUs. Consistent with previous benchmarks on CPU-only architectures, the GPU-accelerated two-stage solver exhibits a parallel performance superior to the one-stage counterpart. Finally, we demonstrate the performance of the GPU-accelerated eigensolver developed in this work for routine semi-local KS-DFT calculations comprising thousands of atoms.},
doi = {10.1016/j.cpc.2020.107808},
url = {https://www.osti.gov/biblio/1773653},
journal = {Computer Physics Communications},
issn = {0010-4655},
number = ,
volume = 262,
place = {United States},
year = {Thu Dec 31 00:00:00 EST 2020},
month = {Thu Dec 31 00:00:00 EST 2020}
}
Works referenced in this record:
CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations
journal, May 2020
- Kühne, Thomas D.; Iannuzzi, Marcella; Del Ben, Mauro
- The Journal of Chemical Physics, Vol. 152, Issue 19
Towards dense linear algebra for hybrid GPU accelerated manycore systems
journal, June 2010
- Tomov, Stanimire; Dongarra, Jack; Baboulin, Marc
- Parallel Computing, Vol. 36, Issue 5-6
Elemental: A New Framework for Distributed Memory Dense Matrix Computations
journal, February 2013
- Poulson, Jack; Marker, Bryan; van de Geijn, Robert A.
- ACM Transactions on Mathematical Software, Vol. 39, Issue 2
NWChem: Past, present, and future
journal, May 2020
- Aprà, E.; Bylaska, E. J.; de Jong, W. A.
- The Journal of Chemical Physics, Vol. 152, Issue 18
ELSI: A unified software interface for Kohn–Sham electronic structure solvers
journal, January 2018
- Yu, Victor Wen-zhe; Corsetti, Fabiano; García, Alberto
- Computer Physics Communications, Vol. 222
BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures
journal, June 2012
- Deslippe, Jack; Samsonidze, Georgy; Strubbe, David A.
- Computer Physics Communications, Vol. 183, Issue 6
Octopus, a computational framework for exploring light-driven phenomena and quantum dynamics in extended and finite systems
journal, March 2020
- Tancogne-Dejean, Nicolas; Oliveira, Micael J. T.; Andrade, Xavier
- The Journal of Chemical Physics, Vol. 152, Issue 12
Optimizations of the eigensolvers in the ELPA library
journal, July 2019
- Kůs, P.; Marek, A.; Köcher, S. S.
- Parallel Computing, Vol. 85
Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set
journal, October 1996
- Kresse, G.; Furthmüller, J.
- Physical Review B, Vol. 54, Issue 16, p. 11169-11186
A Jacobi–Davidson Iteration Method for Linear Eigenvalue Problems
journal, April 1996
- G. Sleijpen, Gerard L.; Van der Vorst, Henk A.
- SIAM Journal on Matrix Analysis and Applications, Vol. 17, Issue 2
Large scale and linear scaling DFT with the CONQUEST code
journal, April 2020
- Nakata, Ayako; Baker, Jack S.; Mujahed, Shereif Y.
- The Journal of Chemical Physics, Vol. 152, Issue 16
Assessment of localized and randomized algorithms for electronic structure
journal, July 2019
- Moussa, Jonathan E.; Baczewski, Andrew D.
- Electronic Structure, Vol. 1, Issue 3
Integrating state of the art compute, communication, and autotuning strategies to multiply the performance of ab initio molecular dynamics on massively parallel multi-core supercomputers
journal, March 2021
- Klöffel, Tobias; Mathias, Gerald; Meyer, Bernd
- Computer Physics Communications, Vol. 260
Inhomogeneous Electron Gas
journal, November 1964
- Hohenberg, P.; Kohn, W.
- Physical Review, Vol. 136, Issue 3B, p. B864-B871
Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations
journal, December 2011
- Auckenthaler, T.; Blum, V.; Bungartz, H. -J.
- Parallel Computing, Vol. 37, Issue 12
A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures
journal, January 1999
- Tisseur, Françoise; Dongarra, Jack
- SIAM Journal on Scientific Computing, Vol. 20, Issue 6
Variationally optimized atomic orbitals for large-scale electronic structures
journal, April 2003
- Ozaki, T.
- Physical Review B, Vol. 67, Issue 15
A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem
journal, January 1995
- Gu, Ming; Eisenstat, Stanley C.
- SIAM Journal on Matrix Analysis and Applications, Vol. 16, Issue 1
\mathcal{O}(N) methods in electronic structure calculations
journal, February 2012
- Bowler, D. R.; Miyazaki, T.
- Reports on Progress in Physics, Vol. 75, Issue 3
DFTB+, a software package for efficient approximate density functional theory based atomistic simulations
journal, March 2020
- Hourahine, B.; Aradi, B.; Blum, V.
- The Journal of Chemical Physics, Vol. 152, Issue 12
Development of a High-Performance Eigensolver on a Peta-Scale Next-Generation Supercomputer System
journal, January 2011
- Imamura, Toshiyuki; Yamada, Susumu; Machida, Masahiko
- Progress in Nuclear Science and Technology, Vol. 2, Issue 0
BaCu 2 Sn(S,Se) 4 : Earth-Abundant Chalcogenides for Thin-Film Photovoltaics
journal, June 2016
- Shin, Donghyeop; Saparov, Bayrammurad; Zhu, Tong
- Chemistry of Materials, Vol. 28, Issue 13
Ab initio molecular simulations with numeric atom-centered orbitals
journal, November 2009
- Blum, Volker; Gehrke, Ralf; Hanke, Felix
- Computer Physics Communications, Vol. 180, Issue 11
Thermodynamic Equilibrium Conditions of Graphene Films on SiC
journal, August 2013
- Nemec, Lydia; Blum, Volker; Rinke, Patrick
- Physical Review Letters, Vol. 111, Issue 6
Quantitative Subsurface Atomic Structure Fingerprint for 2D Materials and Heterostructures by First-Principles-Calibrated Contact-Resonance Atomic Force Microscopy
journal, June 2016
- Tu, Qing; Lange, Björn; Parlak, Zehra
- ACS Nano, Vol. 10, Issue 7
Density-matrix-based algorithm for solving eigenvalue problems
journal, March 2009
- Polizzi, Eric
- Physical Review B, Vol. 79, Issue 11
Massively parallel sparse matrix function calculations with NTPoly
journal, April 2018
- Dawson, William; Nakajima, Takahito
- Computer Physics Communications, Vol. 225
Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase
journal, March 2012
- VandeVondele, Joost; Borštnik, Urban; Hutter, Jürg
- Journal of Chemical Theory and Computation, Vol. 8, Issue 10
Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients
journal, October 1992
- Payne, M. C.; Teter, M. P.; Allan, D. C.
- Reviews of Modern Physics, Vol. 64, Issue 4
The Abinitproject: Impact, environment and recent developments
journal, March 2020
- Gonze, Xavier; Amadon, Bernard; Antonius, Gabriel
- Computer Physics Communications, Vol. 248
The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices
journal, January 1975
- Davidson, Ernest R.
- Journal of Computational Physics, Vol. 17, Issue 1
A Parallel Algorithm for Reducing Symmetric Banded Matrices to Tridiagonal Form
journal, November 1993
- Lang, Bruno
- SIAM Journal on Scientific Computing, Vol. 14, Issue 6
Q uantum ESPRESSO toward the exascale
journal, April 2020
- Giannozzi, Paolo; Baseggio, Oscar; Bonfà, Pietro
- The Journal of Chemical Physics, Vol. 152, Issue 15
GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions
journal, September 2020
- Huhn, William P.; Lange, Björn; Yu, Victor Wen-zhe
- Computer Physics Communications, Vol. 254
ELSI — An open infrastructure for electronic structure solvers
journal, November 2020
- Yu, Victor Wen-zhe; Campos, Carmen; Dawson, William
- Computer Physics Communications, Vol. 256
Unitary Triangularization of a Nonsymmetric Matrix
journal, October 1958
- Householder, Alston S.
- Journal of the ACM (JACM), Vol. 5, Issue 4
Ultra-Performance Pascal GPU and NVLink Interconnect
journal, March 2017
- Foley, Denis; Danskin, John
- IEEE Micro, Vol. 37, Issue 2
Self-Consistent Equations Including Exchange and Correlation Effects
journal, November 1965
- Kohn, W.; Sham, L. J.
- Physical Review, Vol. 140, Issue 4A, p. A1133-A1138
WIEN2k: An APW+lo program for calculating the properties of solids
journal, February 2020
- Blaha, Peter; Schwarz, Karlheinz; Tran, Fabien
- The Journal of Chemical Physics, Vol. 152, Issue 7
ELSI — An open infrastructure for electronic structure solvers
dataset, January 2020
- Yu, Victor Wen-Zhe
- Mendeley
A divide and conquer method for the symmetric tridiagonal eigenproblem
journal, June 1980
- Cuppen, J. J. M.
- Numerische Mathematik, Vol. 36, Issue 2
Ab initio molecular simulations with numeric atom-centered orbitals
journal, November 2009
- Blum, Volker; Gehrke, Ralf; Hanke, Felix
- Computer Physics Communications, Vol. 180, Issue 11
Towards dense linear algebra for hybrid GPU accelerated manycore systems
journal, June 2010
- Tomov, Stanimire; Dongarra, Jack; Baboulin, Marc
- Parallel Computing, Vol. 36, Issue 5-6
Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase
journal, March 2012
- VandeVondele, Joost; Borštnik, Urban; Hutter, Jürg
- Journal of Chemical Theory and Computation, Vol. 8, Issue 10
Introducing ONETEP : Linear-scaling density functional simulations on parallel computers
journal, February 2005
- Skylaris, Chris-Kriton; Haynes, Peter D.; Mostofi, Arash A.
- The Journal of Chemical Physics, Vol. 122, Issue 8
Scalable parallel programming
conference, August 2008
- Nickolls, John; Buck, Ian; Garland, Michael
- 2008 IEEE Hot Chips 20 Symposium (HCS)