Parallel scalability of Hartree–Fock calculations
Abstract
Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree–Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are networkbandwidth bound, making purification methods potentially faster than eigendecomposition methods.
 Authors:
 School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 303320765 (United States)
 Parallel Computing Lab, Intel Corporation, Santa Clara, California 950541549 (United States)
 Publication Date:
 OSTI Identifier:
 22415491
 Resource Type:
 Journal Article
 Resource Relation:
 Journal Name: Journal of Chemical Physics; Journal Volume: 142; Journal Issue: 10; Other Information: (c) 2015 AIP Publishing LLC; Country of input: International Atomic Energy Agency (IAEA)
 Country of Publication:
 United States
 Language:
 English
 Subject:
 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS; 37 INORGANIC, ORGANIC, PHYSICAL AND ANALYTICAL CHEMISTRY; CALCULATION METHODS; CHEMISTRY; DENSITY MATRIX; EFFICIENCY; HARTREEFOCK METHOD; PARTITION; POTENTIALS; PURIFICATION
Citation Formats
Chow, Edmond, Email: echow@cc.gatech.edu, Liu, Xing, Smelyanskiy, Mikhail, and Hammond, Jeff R. Parallel scalability of Hartree–Fock calculations. United States: N. p., 2015.
Web. doi:10.1063/1.4913961.
Chow, Edmond, Email: echow@cc.gatech.edu, Liu, Xing, Smelyanskiy, Mikhail, & Hammond, Jeff R. Parallel scalability of Hartree–Fock calculations. United States. doi:10.1063/1.4913961.
Chow, Edmond, Email: echow@cc.gatech.edu, Liu, Xing, Smelyanskiy, Mikhail, and Hammond, Jeff R. 2015.
"Parallel scalability of Hartree–Fock calculations". United States.
doi:10.1063/1.4913961.
@article{osti_22415491,
title = {Parallel scalability of Hartree–Fock calculations},
author = {Chow, Edmond, Email: echow@cc.gatech.edu and Liu, Xing and Smelyanskiy, Mikhail and Hammond, Jeff R.},
abstractNote = {Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree–Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are networkbandwidth bound, making purification methods potentially faster than eigendecomposition methods.},
doi = {10.1063/1.4913961},
journal = {Journal of Chemical Physics},
number = 10,
volume = 142,
place = {United States},
year = 2015,
month = 3
}

The parallel performance of the NWChem version 1.2{alpha} parallel directSCF code has been characterized on five massively parallel supercomputers (IBM SP, Kendall Square KSR2, CRAY T3D and T3E, and Intel Touchstone DELTA) using singlepoint energy calculations on seven molecules of varying size (up to 389 atoms) and composition (firstrow atoms, halogens, and transition metals). The authors compare the performance using both replicateddata and distributeddata algorithms and the original McMurchieDavidson and recently incorporated TEXAS integrals packages.

Scalability of Correlated Electronic Structure Calculations on Parallel Computer: A Case Study of the RIMP2 Method
The RIMP2 method arises from the application of the ''resolution of the identity'' (RI) integral approximation to the secondorder manybody perturbation theory (MPA). It provides a lowercost alternative to the MP2 method, widely used in the computational chemistry community. This paper describes the implementation of the RIMP2 method using the Global number of processors. Largescale calculations are dominated by a parallel matrix multiplication, and scale quite well from 16 to 128 processors on an IBM RS/6000 Sp system. It is estimated that exact MP2 calculations on the largest system reported here might take as much as 90 times longer thanmore »