DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

Abstract

The predominance of Kohn–Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high-performance computing (HPC). With recent trends in HPC leading toward increasing reliance on heterogeneous accelerator-based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high levels of performance that have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn–Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.

Authors:
 [1];  [1]; ORCiD logo [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Brookhaven National Lab. (BNL), Upton, NY (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research; USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division
OSTI Identifier:
1650078
Alternate Identifier(s):
OSTI ID: 1764587
Report Number(s):
BNL-220973-2021-JAAM
Journal ID: ISSN 2296-2646; ark:/13030/qt0ms5611x
Grant/Contract Number:  
AC02-05CH11231; AC05-00OR22725; SC0012704
Resource Type:
Accepted Manuscript
Journal Name:
Frontiers in Chemistry
Additional Journal Information:
Journal Volume: 8; Journal ID: ISSN 2296-2646
Publisher:
Frontiers Research Foundation
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; density functional theory; graphics processing unit; high-performance computing; parallelcomputing; quantum chemistry; parallel computing

Citation Formats

Williams-Young, David B, de Jong, Wibe A., van Dam, Hubertus J. J., and Yang, Chao. On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters. United States: N. p., 2020. Web. doi:10.3389/fchem.2020.581058.
Williams-Young, David B, de Jong, Wibe A., van Dam, Hubertus J. J., & Yang, Chao. On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters. United States. https://doi.org/10.3389/fchem.2020.581058
Williams-Young, David B, de Jong, Wibe A., van Dam, Hubertus J. J., and Yang, Chao. Thu . "On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters". United States. https://doi.org/10.3389/fchem.2020.581058. https://www.osti.gov/servlets/purl/1650078.
@article{osti_1650078,
title = {On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters},
author = {Williams-Young, David B and de Jong, Wibe A. and van Dam, Hubertus J. J. and Yang, Chao},
abstractNote = {The predominance of Kohn–Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high-performance computing (HPC). With recent trends in HPC leading toward increasing reliance on heterogeneous accelerator-based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high levels of performance that have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn–Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.},
doi = {10.3389/fchem.2020.581058},
journal = {Frontiers in Chemistry},
number = ,
volume = 8,
place = {United States},
year = {Thu Dec 10 00:00:00 EST 2020},
month = {Thu Dec 10 00:00:00 EST 2020}
}

Works referenced in this record:

High-performance Tensor Contractions for GPUs
journal, January 2016


Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods
journal, September 2013

  • Andrade, Xavier; Aspuru-Guzik, Alán
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 10
  • DOI: 10.1021/ct400520e

NWChem: Past, present, and future
journal, May 2020

  • Aprà, E.; Bylaska, E. J.; de Jong, W. A.
  • The Journal of Chemical Physics, Vol. 152, Issue 18
  • DOI: 10.1063/5.0004997

Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units
journal, February 2010

  • Asadchev, Andrey; Allada, Veerendra; Felder, Jacob
  • Journal of Chemical Theory and Computation, Vol. 6, Issue 3
  • DOI: 10.1021/ct9005079

A multicenter numerical integration scheme for polyatomic molecules
journal, February 1988

  • Becke, A. D.
  • The Journal of Chemical Physics, Vol. 88, Issue 4
  • DOI: 10.1063/1.454033

Density‐functional thermochemistry. III. The role of exact exchange
journal, April 1993

  • Becke, Axel D.
  • The Journal of Chemical Physics, Vol. 98, Issue 7, p. 5648-5652
  • DOI: 10.1063/1.464913

Massively Multicore Parallelization of Kohn−Sham Theory
journal, September 2008

  • Brown, Philip; Woods, Christopher; McIntosh-Smith, Simon
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 10
  • DOI: 10.1021/ct800261j

A massively multicore parallelization of the Kohn-Sham energy gradients
journal, January 2010

  • Brown, Philip; Woods, Christopher J.; McIntosh-Smith, Simon
  • Journal of Computational Chemistry
  • DOI: 10.1002/jcc.21485

Linear Scaling Hierarchical Integration Scheme for the Exchange-Correlation Term in Molecular and Periodic Systems
journal, August 2011

  • Burow, Asbjörn M.; Sierka, Marek
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 10
  • DOI: 10.1021/ct200412r

SG-0: A small standard grid for DFT quadrature on large systems
journal, January 2006

  • Chien, Siu-Hung; Gill, Peter M. W.
  • Journal of Computational Chemistry, Vol. 27, Issue 6
  • DOI: 10.1002/jcc.20383

Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system
conference, November 2019

  • Das, Sambit; Motamarri, Phani; Gavini, Vikram
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/3295500.3357157

Utilizing high performance computing for chemistry: parallel computational chemistry
journal, January 2010

  • de Jong, Wibe A.; Bylaska, Eric; Govind, Niranjan
  • Physical Chemistry Chemical Physics, Vol. 12, Issue 26
  • DOI: 10.1039/c002859b

Self‐Consistent Molecular‐Orbital Methods. IX. An Extended Gaussian‐Type Basis for Molecular‐Orbital Studies of Organic Molecules
journal, January 1971

  • Ditchfield, R.; Hehre, W. J.; Pople, J. A.
  • The Journal of Chemical Physics, Vol. 54, Issue 2
  • DOI: 10.1063/1.1674902

Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
journal, January 1989

  • Dunning, Thom H.
  • The Journal of Chemical Physics, Vol. 90, Issue 2
  • DOI: 10.1063/1.456153

Two-Component Noncollinear Time-Dependent Spin Density Functional Theory for Excited State Calculations
journal, May 2017

  • Egidi, Franco; Sun, Shichao; Goings, Joshua J.
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 6
  • DOI: 10.1021/acs.jctc.7b00104

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
conference, January 2004

  • Fatahalian, K.; Sugerman, J.; Hanrahan, P.
  • Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware - HWWS '04
  • DOI: 10.1145/1058129.1058148

Self‐consistent molecular orbital methods. XXIII. A polarization‐type basis set for second‐row elements
journal, October 1982

  • Francl, Michelle M.; Pietro, William J.; Hehre, Warren J.
  • The Journal of Chemical Physics, Vol. 77, Issue 7, p. 3654-3665
  • DOI: 10.1063/1.444267

Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures
journal, July 2009

  • Genovese, Luigi; Ospici, Matthieu; Deutsch, Thierry
  • The Journal of Chemical Physics, Vol. 131, Issue 3
  • DOI: 10.1063/1.3166140

A standard grid for density functional calculations
journal, July 1993


Radial quadrature for multiexponential integrands
journal, April 2003

  • Gill, Peter M. W.; Chien, Siu-Hung
  • Journal of Computational Chemistry, Vol. 24, Issue 6
  • DOI: 10.1002/jcc.10211

Novel Computer Architectures and Quantum Chemistry
journal, May 2020

  • Gordon, Mark S.; Barca, Giuseppe; Leang, Sarom S.
  • The Journal of Physical Chemistry A, Vol. 124, Issue 23
  • DOI: 10.1021/acs.jpca.0c02249

Self-consistent molecular-orbital methods. 22. Small split-valence basis sets for second-row elements
journal, May 1982

  • Gordon, Mark S.; Binkley, J. Stephen; Pople, John A.
  • Journal of the American Chemical Society, Vol. 104, Issue 10
  • DOI: 10.1021/ja00374a017

Batched matrix computations on hardware accelerators based on GPUs
journal, April 2014

  • Haidar, Azzam; Dong, Tingxing; Luszczek, Piotr
  • The International Journal of High Performance Computing Applications, Vol. 29, Issue 2
  • DOI: 10.1177/1094342014567546

The influence of polarization functions on molecular orbital hydrogenation energies
journal, January 1973

  • Hariharan, P. C.; Pople, J. A.
  • Theoretica Chimica Acta, Vol. 28, Issue 3
  • DOI: 10.1007/BF00533485

Self—Consistent Molecular Orbital Methods. XII. Further Extensions of Gaussian—Type Basis Sets for Use in Molecular Orbital Studies of Organic Molecules
journal, March 1972

  • Hehre, W. J.; Ditchfield, R.; Pople, J. A.
  • The Journal of Chemical Physics, Vol. 56, Issue 5, p. 2257-2261
  • DOI: 10.1063/1.1677527

Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC
conference, November 2019

  • Herault, Thomas; Robert, Yves; Bosilca, George
  • 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)
  • DOI: 10.1109/ScalA49573.2019.00010

Inhomogeneous Electron Gas
journal, November 1964


GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions
journal, September 2020

  • Huhn, William P.; Lange, Björn; Yu, Victor Wen-zhe
  • Computer Physics Communications, Vol. 254
  • DOI: 10.1016/j.cpc.2020.107314

Towards Highly scalable Ab Initio Molecular Dynamics (AIMD) Simulations on the Intel Knights Landing Manycore Processor
conference, May 2017

  • Jacquelin, Mathias; De Jong, Wibe; Bylaska, Eric
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.26

Parallel transport time-dependent density functional theory calculations with hybrid functional on summit
conference, November 2019

  • Jia, Weile; Wang, Lin-Wang; Lin, Lin
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/3295500.3356144

Arbitrary Angular Momentum Electron Repulsion Integrals with Graphical Processing Units: Application to the Resolution of Identity Hartree–Fock Method
journal, June 2017

  • Kalinowski, Jaroslaw; Wennmohs, Frank; Neese, Frank
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 7
  • DOI: 10.1021/acs.jctc.7b00030

GPU clusters for high-performance computing
conference, August 2009

  • Kindratenko, Volodymyr V.; Enos, Jeremy J.; Shi, Guochun
  • 2009 IEEE International Conference on Cluster Computing and Workshops
  • DOI: 10.1109/CLUSTR.2009.5289128

Self-Consistent Equations Including Exchange and Correlation Effects
journal, November 1965


Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units
journal, May 2017

  • Kussmann, Jörg; Ochsenfeld, Christian
  • Journal of Chemical Theory and Computation, Vol. 13, Issue 6
  • DOI: 10.1021/acs.jctc.7b00515

An improved molecular partitioning scheme for numerical quadratures in density functional theory
journal, November 2018

  • Laqua, Henryk; Kussmann, Jörg; Ochsenfeld, Christian
  • The Journal of Chemical Physics, Vol. 149, Issue 20
  • DOI: 10.1063/1.5049435

Highly Efficient, Linear-Scaling Seminumerical Exact-Exchange Method for Graphic Processing Units
journal, February 2020

  • Laqua, Henryk; Thompson, Travis H.; Kussmann, Jörg
  • Journal of Chemical Theory and Computation, Vol. 16, Issue 3
  • DOI: 10.1021/acs.jctc.9b00860

Optimization and Parallelization of DFT and TDDFT in GAMESS on DoD HPC Machines
conference, July 2008


Quadratures on a sphere
journal, January 1976


Recent developments in libxc — A comprehensive library of functionals for density functional theory
journal, January 2018


Gaussian Basis Set Hartree-Fock, Density Functional Theory, and Beyond on GPUs
book, January 2016

  • Luehr, Nathan; Sisto, Aaron; Mart??nez, Todd J.
  • Electronic Structure Calculations on Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics
  • DOI: 10.1002/9781118670712.ch4

Speeding up plane-wave electronic-structure calculations using graphics-processing units
journal, July 2011

  • Maintz, Stefan; Eck, Bernhard; Dronskowski, Richard
  • Computer Physics Communications, Vol. 182, Issue 7
  • DOI: 10.1016/j.cpc.2011.03.010

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program
journal, June 2020

  • Manathunga, Madushanka; Miao, Yipu; Mu, Dawei
  • Journal of Chemical Theory and Computation, Vol. 16, Issue 7
  • DOI: 10.1021/acs.jctc.0c00290

Acceleration of Electron Repulsion Integral Evaluation on Graphics Processing Units via Use of Recurrence Relations
journal, December 2012

  • Miao, Yipu; Merz, Kenneth M.
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 2
  • DOI: 10.1021/ct300754n

DFT-FE – A massively parallel adaptive finite-element code for large-scale density functional theory calculations
journal, January 2020


Improved radial grids for quadrature in molecular density‐functional calculations
journal, June 1996

  • Mura, Michael E.; Knowles, Peter J.
  • The Journal of Chemical Physics, Vol. 104, Issue 24
  • DOI: 10.1063/1.471749

Quadrature schemes for integrals of density functional theory
journal, March 1993

  • Murray, Christopher W.; Handy, Nicholas C.; Laming, Gregory J.
  • Molecular Physics, Vol. 78, Issue 4
  • DOI: 10.1080/00268979300100651

An Improved Magma Gemm For Fermi Graphics Processing Units
journal, September 2010

  • Nath, Rajib; Tomov, Stanimire; Dongarra, Jack
  • The International Journal of High Performance Computing Applications, Vol. 24, Issue 4
  • DOI: 10.1177/1094342010385729

Automatic translation of MPI source into a latency-tolerant, data-driven form
journal, August 2017

  • Nguyen, Tan; Cicotti, Pietro; Bylaska, Eric
  • Journal of Parallel and Distributed Computing, Vol. 106
  • DOI: 10.1016/j.jpdc.2017.02.009

Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
journal, May 2006

  • Nieplocha, Jarek; Palmer, Bruce; Tipparaju, Vinod
  • The International Journal of High Performance Computing Applications, Vol. 20, Issue 2
  • DOI: 10.1177/1094342006064503

Trends in High Performance Computing: Exascale Systems and Facilities Beyond the First Wave
conference, May 2019

  • Parnell, Lynn A.; Demetriou, Dustin W.; Kamath, Vinod
  • 2019 18th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm)
  • DOI: 10.1109/ITHERM.2019.8757229

Density-functional approximation for the correlation energy of the inhomogeneous electron gas
journal, June 1986


Generalized Gradient Approximation Made Simple
journal, October 1996

  • Perdew, John P.; Burke, Kieron; Ernzerhof, Matthias
  • Physical Review Letters, Vol. 77, Issue 18, p. 3865-3868
  • DOI: 10.1103/PhysRevLett.77.3865

Accurate and simple density functional for the electronic exchange energy: Generalized gradient approximation
journal, June 1986


Combining Graphics Processing Units, Simplified Time-Dependent Density Functional Theory, and Finite-Difference Couplings to Accelerate Nonadiabatic Molecular Dynamics
journal, May 2020

  • Peters, Laurens D. M.; Kussmann, Jörg; Ochsenfeld, Christian
  • The Journal of Physical Chemistry Letters, Vol. 11, Issue 10
  • DOI: 10.1021/acs.jpclett.0c00320

An efficient implementation of two-component relativistic density functional theory with torque-free auxiliary variables
journal, July 2018

  • Petrone, Alessio; Williams-Young, David B.; Sun, Shichao
  • The European Physical Journal B, Vol. 91, Issue 7
  • DOI: 10.1140/epjb/e2018-90170-1

Kohn—Sham density-functional theory within a finite basis set
journal, November 1992


Challenges in large scale quantum mechanical calculations: Challenges in large scale quantum mechanical calculations
journal, November 2016

  • Ratcliff, Laura E.; Mohr, Stephan; Huhs, Georg
  • Wiley Interdisciplinary Reviews: Computational Molecular Science, Vol. 7, Issue 1
  • DOI: 10.1002/wcms.1290

Transformation between Cartesian and pure spherical harmonic Gaussians
journal, April 1995

  • Schlegel, H. Bernhard; Frisch, Michael J.
  • International Journal of Quantum Chemistry, Vol. 54, Issue 2
  • DOI: 10.1002/qua.560540202

Achieving linear scaling in exchange-correlation density functional quadratures
journal, July 1996

  • Stratmann, R. Eric; Scuseria, Gustavo E.; Frisch, Michael J.
  • Chemical Physics Letters, Vol. 257, Issue 3-4
  • DOI: 10.1016/0009-2614(96)00600-8

Generating Efficient Quantum Chemistry Codes for Novel Architectures
journal, November 2012

  • Titov, Alexey V.; Ufimtsev, Ivan S.; Luehr, Nathan
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 1
  • DOI: 10.1021/ct300321a

Towards dense linear algebra for hybrid GPU accelerated manycore systems
journal, June 2010


Efficient molecular numerical integration schemes
journal, January 1995

  • Treutler, Oliver; Ahlrichs, Reinhart
  • The Journal of Chemical Physics, Vol. 102, Issue 1
  • DOI: 10.1063/1.469408

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008

  • Ufimtsev, Ivan S.; Martínez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 2
  • DOI: 10.1021/ct700268q

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009

  • Ufimtsev, Ivan S.; Martinez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 5, Issue 4
  • DOI: 10.1021/ct800526s

Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics
journal, August 2009

  • Ufimtsev, Ivan S.; Martinez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 5, Issue 10
  • DOI: 10.1021/ct9003004

Large scale plane wave pseudopotential density functional theory calculations on GPU clusters
conference, January 2011

  • Wang, Long; Wu, Yue; Jia, Weile
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063479

Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon
journal, January 1993

  • Woon, David E.; Dunning, Thom H.
  • The Journal of Chemical Physics, Vol. 98, Issue 2
  • DOI: 10.1063/1.464303

Density functional theory calculations: A powerful tool to simulate and design high-performance energy storage and conversion materials
journal, June 2019

  • Wu, Xi; Kang, Feiyu; Duan, Wenhui
  • Progress in Natural Science: Materials International, Vol. 29, Issue 3
  • DOI: 10.1016/j.pnsc.2019.04.003

Accelerating Density Functional Calculations with Graphics Processing Unit
journal, July 2008

  • Yasuda, Koji
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 8
  • DOI: 10.1021/ct8001046

GPU‐Accelerated Large‐Scale Excited‐State Simulation Based on Divide‐and‐Conquer Time‐Dependent Density‐Functional Tight‐Binding
journal, August 2019

  • Yoshikawa, Takeshi; Komoto, Nana; Nishimura, Yoshifumi
  • Journal of Computational Chemistry, Vol. 40, Issue 31
  • DOI: 10.1002/jcc.26053

ChemInform Abstract: SELF-CONSISTENT MOLECULAR-ORBITAL METHODS. 22. SMALL SPLIT-VALENCE BASIS SETS S FOR SECOND-ROW ELEMENTS
journal, August 1982

  • Gordon, M. S.; Binkley, J. S.; Pople, J. A.
  • Chemischer Informationsdienst, Vol. 13, Issue 34
  • DOI: 10.1002/chin.198234002