skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems

Abstract

The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of numerical algorithms, among them electronic structure theory in chemistry and in condensed matter physics. Large eigenproblems can easily exceed the capacity of a single compute node, thus must be solved on distributed-memory parallel computers. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. Furthermore, we benchmark the performance of this GPU-accelerated eigensolver on two hybrid CPU–GPU architectures, namely a compute cluster based on Intel Xeon Gold CPUs and NVIDIA Volta GPUs, and the Summit supercomputer based on IBM POWER9 CPUs and NVIDIA Volta GPUs. Consistent with previous benchmarks on CPU-only architectures, the GPU-accelerated two-stage solver exhibits a parallel performance superior to the one-stage counterpart. Finally, we demonstrate the performance of the GPU-accelerated eigensolver developed in this work for routine semi-local KS-DFT calculations comprising thousands of atoms.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3];  [4];  [5];  [6]; ORCiD logo [4]; ORCiD logo [1]
  1. Duke Univ., Durham, NC (United States)
  2. Molecular Sciences Software Inst., Blacksburg, VA (United States)
  3. Max Planck Computing and Data Facility, Garching (Germany); Czech Academy of Sciences, Prague (Czech Republic). Inst. of Mathematics
  4. Max Planck Computing and Data Facility, Garching (Germany)
  5. NVIDIA Switzerland, Zurich (Switzerland)
  6. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Center for Nanophase Materials Sciences (CNMS)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1773653
Alternate Identifier(s):
OSTI ID: 1775683
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Computer Physics Communications
Additional Journal Information:
Journal Volume: 262; Journal ID: ISSN 0010-4655
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Eigensolver; dense linear algebra; parallel computing; high-performance computing; GPU; CUDA

Citation Formats

Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, and Blum, Volker. GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems. United States: N. p., 2020. Web. doi:10.1016/j.cpc.2020.107808.
Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, & Blum, Volker. GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems. United States. https://doi.org/10.1016/j.cpc.2020.107808
Yu, Victor Wen-zhe, Moussa, Jonathan, Kůs, Pavel, Marek, Andreas, Messmer, Peter, Yoon, Mina, Lederer, Hermann, and Blum, Volker. 2020. "GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems". United States. https://doi.org/10.1016/j.cpc.2020.107808. https://www.osti.gov/servlets/purl/1773653.
@article{osti_1773653,
title = {GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and hermitian eigenproblems},
author = {Yu, Victor Wen-zhe and Moussa, Jonathan and Kůs, Pavel and Marek, Andreas and Messmer, Peter and Yoon, Mina and Lederer, Hermann and Blum, Volker},
abstractNote = {The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of numerical algorithms, among them electronic structure theory in chemistry and in condensed matter physics. Large eigenproblems can easily exceed the capacity of a single compute node, thus must be solved on distributed-memory parallel computers. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. Furthermore, we benchmark the performance of this GPU-accelerated eigensolver on two hybrid CPU–GPU architectures, namely a compute cluster based on Intel Xeon Gold CPUs and NVIDIA Volta GPUs, and the Summit supercomputer based on IBM POWER9 CPUs and NVIDIA Volta GPUs. Consistent with previous benchmarks on CPU-only architectures, the GPU-accelerated two-stage solver exhibits a parallel performance superior to the one-stage counterpart. Finally, we demonstrate the performance of the GPU-accelerated eigensolver developed in this work for routine semi-local KS-DFT calculations comprising thousands of atoms.},
doi = {10.1016/j.cpc.2020.107808},
url = {https://www.osti.gov/biblio/1773653}, journal = {Computer Physics Communications},
issn = {0010-4655},
number = ,
volume = 262,
place = {United States},
year = {Thu Dec 31 00:00:00 EST 2020},
month = {Thu Dec 31 00:00:00 EST 2020}
}

Works referenced in this record:

CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations
journal, May 2020


Towards dense linear algebra for hybrid GPU accelerated manycore systems
journal, June 2010


Elemental: A New Framework for Distributed Memory Dense Matrix Computations
journal, February 2013


NWChem: Past, present, and future
journal, May 2020


ELSI: A unified software interface for Kohn–Sham electronic structure solvers
journal, January 2018


BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures
journal, June 2012


Octopus, a computational framework for exploring light-driven phenomena and quantum dynamics in extended and finite systems
journal, March 2020


Optimizations of the eigensolvers in the ELPA library
journal, July 2019


Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set
journal, October 1996


A Jacobi–Davidson Iteration Method for Linear Eigenvalue Problems
journal, April 1996


Large scale and linear scaling DFT with the CONQUEST code
journal, April 2020


Assessment of localized and randomized algorithms for electronic structure
journal, July 2019


Inhomogeneous Electron Gas
journal, November 1964


Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations
journal, December 2011


A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures
journal, January 1999


A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem
journal, January 1995


\mathcal{O}(N) methods in electronic structure calculations
journal, February 2012


DFTB+, a software package for efficient approximate density functional theory based atomistic simulations
journal, March 2020


Development of a High-Performance Eigensolver on a Peta-Scale Next-Generation Supercomputer System
journal, January 2011


BaCu 2 Sn(S,Se) 4 : Earth-Abundant Chalcogenides for Thin-Film Photovoltaics
journal, June 2016


Ab initio molecular simulations with numeric atom-centered orbitals
journal, November 2009


Thermodynamic Equilibrium Conditions of Graphene Films on SiC
journal, August 2013


Density-matrix-based algorithm for solving eigenvalue problems
journal, March 2009


Massively parallel sparse matrix function calculations with NTPoly
journal, April 2018


Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase
journal, March 2012


Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients
journal, October 1992


The Abinitproject: Impact, environment and recent developments
journal, March 2020


A Parallel Algorithm for Reducing Symmetric Banded Matrices to Tridiagonal Form
journal, November 1993


Q uantum ESPRESSO toward the exascale
journal, April 2020


GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions
journal, September 2020


ELSI — An open infrastructure for electronic structure solvers
journal, November 2020


Unitary Triangularization of a Nonsymmetric Matrix
journal, October 1958


Ultra-Performance Pascal GPU and NVLink Interconnect
journal, March 2017


Self-Consistent Equations Including Exchange and Correlation Effects
journal, November 1965


WIEN2k: An APW+lo program for calculating the properties of solids
journal, February 2020


A divide and conquer method for the symmetric tridiagonal eigenproblem
journal, June 1980


Ab initio molecular simulations with numeric atom-centered orbitals
journal, November 2009


Towards dense linear algebra for hybrid GPU accelerated manycore systems
journal, June 2010


Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase
journal, March 2012


Introducing ONETEP : Linear-scaling density functional simulations on parallel computers
journal, February 2005


Scalable parallel programming
conference, August 2008