Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Multinode Multi-GPU Two-Electron Integrals: Code Generation Using the Regent Language

Journal Article · · Journal of Chemical Theory and Computation
 [1];  [2];  [3];  [2];  [4];  [5]
  1. Stanford Univ., CA (United States); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States); SLAC
  2. SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)
  3. Stanford Univ., CA (United States)
  4. SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States); Stanford Univ., CA (United States)
  5. Stanford Univ., CA (United States); SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)

The computation of two-electron repulsion integrals (ERIs) is often the most expensive step of integral-direct self-consistent field methods. Formally it scales as O(N4), where N is the number of Gaussian basis functions used to represent the molecular wave function. In practice, this scaling can be reduced to O(N2) or less by neglecting small integrals with screening methods. The contributions of the ERIs to the Fock matrix are of Coulomb (J) and exchange (K) type and require separate algorithms to compute matrix elements efficiently. We previously implemented highly efficient GPU-accelerated J-matrix and K-matrix algorithms in the electronic structure code TeraChem. Although these implementations supported the use of multiple GPUs on a node, they did not support the use of multiple nodes. This presents a key bottleneck to cutting-edge ab initio simulations of large systems, e.g., excited state dynamics of photoactive proteins. We present our implementation of multinode multi-GPU J- and K-matrix algorithms in TeraChem using the Regent programming language. Regent directly supports distributed computation in a task-based model and can generate code for a variety of architectures, including NVIDIA GPUs. We demonstrate multinode scaling up to 45 GPUs (3 nodes) and benchmark against hand-coded TeraChem integral code. Finally, we also outline our metaprogrammed Regent implementation, which enables flexible code generation for integrals of different angular momenta.

Research Organization:
SLAC National Accelerator Laboratory (SLAC), Menlo Park, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR). Scientific Discovery through Advanced Computing (SciDAC)
Grant/Contract Number:
AC02-76SF00515; SC0019323
OSTI ID:
1998577
Journal Information:
Journal of Chemical Theory and Computation, Journal Name: Journal of Chemical Theory and Computation Journal Issue: 11 Vol. 18; ISSN 1549-9618
Publisher:
American Chemical SocietyCopyright Statement
Country of Publication:
United States
Language:
English

References (84)

Calculation of Gaussian integrals using symbolic manipulation journal January 1997
Lua—An Extensible Extension Language journal June 1996
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
Scalable molecular dynamics with NAMD journal January 2005
Two-electron integral evaluation on the graphics processor unit journal January 2007
Principles for a direct SCF approach to LICAO - MO ab - initio calculations : Direct journal September 1982
Improvements on the direct SCF method: Improved Direct SCF Method journal January 1989
Symbolic calculation in chemistry: Selected examples journal January 2004
Efficient calculation of two-electron integrals for high angular basis functions journal January 2014
Coupled‐cluster singles, doubles and perturbative triples with density fitting approximation for massively parallel heterogeneous platforms journal January 2019
Molint 1.0: A framework for the computation of molecular integrals and their derivatives for density‐fitted methods journal February 2021
Computer-generated formulas for overlap integrals of slater-type orbitals journal September 1980
The complete active space SCF method in a fock-matrix-based super-CI formulation journal March 1980
Spin-restricted ensemble-referenced Kohn-Sham method: basic principles and application to strongly correlated ground and excited states of molecules journal November 2014
TeraChem : A graphical processing unit ‐accelerated electronic structure package for large‐scale ab initio molecular dynamics journal July 2020
The continuous fast multipole method journal November 1994
The Coulomb operator in a Gaussian product basis journal December 1995
One- and two-electron integrals over cartesian gaussian functions journal February 1978
Maple programs for generating efficient FORTRAN code for serial and vectorised machines journal December 1998
Numerical computation of molecular integrals via optimized (vectorized) FORTRAN code
  • Scott, T. C.; Monagan, M. B.; Grant, I. P.
  • Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 389, Issue 1-2 https://doi.org/10.1016/S0168-9002(97)00059-4
journal April 1997
Daubechies wavelets for high performance electronic structure calculations: The BigDFT project journal February 2011
GPU-accelerated molecular modeling coming of age journal September 2010
A massively parallel tensor contraction framework for coupled-cluster computations journal December 2014
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
Performance of Coupled-Cluster Singles and Doubles on Modern Stream Processing Architectures journal June 2020
High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm journal November 2020
Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program journal June 2021
Faster Self-Consistent Field (SCF) Calculations on GPU Clusters journal November 2021
Nanoscale Multireference Quantum Chemistry: Full Configuration Interaction on Graphical Processing Units journal September 2015
Automated Code Engine for Graphical Processing Units: Application to the Effective Core Potential Integrals and Gradients journal December 2015
Hybrid CPU/GPU Integral Engine for Strong-Scaling Ab Initio Methods journal June 2017
Arbitrary Angular Momentum Electron Repulsion Integrals with Graphical Processing Units: Application to the Resolution of Identity Hartree–Fock Method journal June 2017
libreta : Computerized Optimization and Code Synthesis for Electron Repulsion Integral Evaluation journal January 2018
Calculation of Quantum Chemical Two-Electron Integrals by Applying Compiler Technology on GPU journal September 2019
Highly Efficient, Linear-Scaling Seminumerical Exact-Exchange Method for Graphic Processing Units journal February 2020
Theoretical Studies of Electronically Excited States of Molecular Systems Using Multiconfigurational Perturbation Theory journal February 1999
Single-Reference ab Initio Methods for the Calculation of Excited States of Large Molecules journal November 2005
Coupled Cluster Theory on Graphics Processing Units I. The Coupled Cluster Doubles Method journal April 2011
Kohn−Sham Density Functional Theory Electronic Structure Calculations with Linearly Scaling Computational Time and Memory Usage journal December 2010
GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems journal April 2011
Generating Efficient Quantum Chemistry Codes for Novel Architectures journal November 2012
New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock journal September 2012
Acceleration of High Angular Momentum Electron Repulsion Integrals and Integral Derivatives on Graphics Processing Units journal March 2015
Preselective Screening for Linear-Scaling Exact Exchange-Gradient Calculations for Graphics Processing Units and General Strong-Scaling Massively Parallel Calculations journal February 2015
Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation journal January 2008
Accelerating Density Functional Calculations with Graphics Processing Unit journal July 2008
Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation journal March 2009
Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics journal August 2009
Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units journal February 2010
Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units and a Mixed Precision Matrix Multiplication Library journal December 2009
Efficient computation of two-electron - repulsion integrals and their nth-order derivatives using contracted Gaussian basis sets journal July 1990
Tensor Contraction Engine:  Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories journal November 2003
Accelerating Resolution-of-the-Identity Second-Order Møller−Plesset Quantum Chemistry Calculations with Graphical Processing Units journal March 2008
The Fourier transform Coulomb method: Efficient and accurate calculation of the Coulomb operator in a Gaussian basis journal November 2002
Reduced scaling in electronic structure calculations using Cholesky decompositions journal June 2003
Coulombic potential energy integrals and approximations journal May 1973
Multipole-based integral estimates for the rigorous description of distance dependence in two-electron integrals journal November 2005
Hartree–Fock calculations with linearly scaling memory usage journal May 2008
Linear-scaling atomic orbital-based second-order Møller–Plesset perturbation theory by rigorous integral screening criteria journal February 2009
Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures journal July 2009
Evaluation of molecular integrals over Gaussian basis functions journal July 1976
General recurrence formulas for molecular integrals over Cartesian Gaussian functions journal August 1988
A J matrix engine for density functional theory calculations journal February 1996
A linear scaling method for Hartree–Fock exchange calculations of large molecules journal November 1996
Linear scaling computation of the Fock matrix. II. Rigorous bounds on exchange integrals and incremental Fock build journal June 1997
An efficient and near linear scaling pair natural orbital based local coupled cluster method journal January 2013
An atomic orbital-based formulation of the complete active space self-consistent field method on graphical processing units journal June 2015
Hole–hole Tamm–Dancoff-approximated density functional theory: A highly efficient electronic structure method incorporating dynamic and static correlation journal July 2020
Massively Parallel Quantum Chemistry: A high-performance research platform for electronic structure journal July 2020
TeraChem: Accelerating electronic structure and ab initio molecular dynamics with graphical processing units journal June 2020
Scalable molecular dynamics on CPU and GPU architectures with NAMD journal July 2020
Fast Hartree–Fock theory using local density fitting approximations journal November 2004
Automatic code generation for many-body electronic structure methods: the tensor contraction engine‡‡ journal January 2006
Electronic wave functions - I. A general method of calculation for the stationary states of any molecular system journal February 1950
The Open Community Runtime: A runtime system for extreme scale computing conference September 2016
Kokkos/Qthreads task-parallel approach to linear algebra based graph analytics conference September 2016
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
  • Gautier, Thierry; Lima, Joao V. F.; Maillard, Nicolas
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.66
conference May 2013
Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units journal July 2010
Legion: Expressing locality and independence with logical regions
  • Bauer, Michael; Treichler, Sean; Slaughter, Elliott
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71
conference November 2012
Scaling the Hartree-Fock Matrix Build on Summit conference November 2020
Achieving Linear Scaling for the Electronic Quantum Coulomb Problem journal January 1996
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation journal January 2016
Terra journal June 2013
Regent: a high-productivity programming language for HPC with logical regions
  • Slaughter, Elliott; Lee, Wonchan; Treichler, Sean
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807629
conference January 2015

Similar Records

TeraChem: A graphical processing unit-accelerated electronic structure package for large-scale ab initio molecular dynamics
Journal Article · Sun Jul 26 00:00:00 EDT 2020 · Wiley Interdisciplinary Reviews: Computational Molecular Science · OSTI ID:1656582

LibERI—A portable and performant multi-GPU accelerated library for electron repulsion integrals via OpenMP offloading and standard language parallelism
Journal Article · Thu Aug 22 00:00:00 EDT 2024 · Journal of Chemical Physics · OSTI ID:2475180

Distributed out-of-memory NMF on CPU/GPU architectures
Journal Article · Fri Sep 08 00:00:00 EDT 2023 · Journal of Supercomputing · OSTI ID:2246858