Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A parallel strategy for density functional theory computations on accelerated nodes

Journal Article · · Parallel Computing
Using the Löwdin orthonormalization of tall-skinny matrices as a proxy-app for wavefunction-based Density Functional Theory solvers, we investigate a distributed memory parallel strategy focusing on Graphics Processing Unit (GPU)-accelerated nodes as available on some of the top ranked supercomputers at the present time. Here we present numerical results in the strong limit regime, as it is particularly relevant for First-Principles Molecular Dynamics. We also examine how matrix product-based iterative solvers provide a competitive alternative to dense eigensolvers on GPUs, allowing to push the strong scaling limit of these computations to a larger number of distributed tasks. Our strategy, which relies on replicated Gram matrices and efficient collective communications using the NCCL library, leads to a time-to-solution under 0.5 s for the Löwdin orthonormalization of a tall-skinny matrix of 3000 columns on Summit at Oak Ridge Leadership Facility (OLCF). Given the similarity in computational operations between one iteration of a DFT solver and this proxy-app, this shows the possibility of solving accurately the DFT equations well under a minute for 3000 electronic wave functions, and thus perform First-Principles molecular dynamics of physical systems much larger than traditionally solved on CPU systems.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1731060
Alternate ID(s):
OSTI ID: 1809491
Journal Information:
Parallel Computing, Journal Name: Parallel Computing Vol. 100; ISSN 0167-8191
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (26)

A parallel code for a self-consistent charge density functional based tight binding method: Total energy calculations for extended systems journal January 1999
Ab initio molecular simulations with numeric atom-centered orbitals journal November 2009
Self-consistent-field calculations using Chebyshev-filtered subspace iteration journal November 2006
Accelerated Block Preconditioned Gradient method for large scale wave functions calculations in Density Functional Theory journal January 2010
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA journal May 2017
Optimizations of the eigensolvers in the ELPA library journal July 2019
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units journal October 2012
Trace resetting density matrix purification in O(N) self-consistent-field theory journal May 2003
On the Non‐Orthogonality Problem Connected with the Use of Atomic Wave Functions in the Theory of Molecules and Crystals journal March 1950
Linear-scaling symmetric square-root decomposition of the overlap matrix journal March 2007
Löwdin orthogonalization as a minimum energy perturbation journal April 1975
Perspective on density functional theory journal April 2012
A variational method for density functional theory calculations on metallic systems with thousands of atoms journal August 2013
Inhomogeneous Electron Gas journal November 1964
Self-Consistent Equations Including Exchange and Correlation Effects journal November 1965
Towards grid-based O ( N ) density-functional theory methods: Optimized nonorthogonal orbitals and multigrid acceleration journal July 2000
Expansion algorithm for the density matrix journal October 2002
Some Recent Advances in Density Matrix Theory journal April 1960
Strategies to Deploy and Scale Deep Learning on the Summit Supercomputer conference November 2019
Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application
  • Karlin, Ian; Bhatele, Abhinav; Keasler, Jeff
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.115
conference May 2013
Reconstructing Householder Vectors from Tall-Skinny QR
  • Ballard, Grey; Demmel, James; Grigori, Laura
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.120
conference May 2014
Modeling Dilute Solutions Using First-Principles Molecular Dynamics: Computing more than a Million Atoms with over a Million Cores
  • Fattebert, Jean-Luc; Osei-Kuffuor, Daniel; Draeger, Erik W.
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.88
conference November 2016
The Davidson Method journal January 1994
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs journal January 2015
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method journal January 2001
A Block Orthogonalization Procedure with Constant Synchronization Requirements journal January 2002

Similar Records

Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
Journal Article · Wed Apr 10 20:00:00 EDT 2024 · Concurrency and Computation. Practice and Experience · OSTI ID:2336800

A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data
Conference · Sun Nov 01 00:00:00 EDT 2020 · OSTI ID:1772867