skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Hartree-Fock Application Using UPC++ and the New DArray Library

Journal Article · · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)
 [1];  [2];  [2];  [2];  [3];  [1];  [2];  [2]
  1. Univ. of Oregon, Eugene, OR (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Intel Corporation, Portland, OR (United States)

The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming models and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-05CH11231; SC0006723; SC0012381; SC0005360
OSTI ID:
1379522
Journal Information:
Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Conference: 2016 IEEE 30th International Parallel and Distributed Processing Symposium, Chicago, IL (United States), 23-27 May 2016; ISSN 1530-2075
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (19)

Heuristic static load-balancing algorithm applied to the fragment molecular orbital method
  • Alexeev, Yuri; Mahajan, Ashutosh; Leyffer, Sven
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.62
conference November 2012
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
  • Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503290
conference January 2013
Parallel Computing in Quantum Chemistry reference-book January 2008
Designing scalable PGAS communication subsystems on cray gemini interconnect conference December 2012
Performance characterization of global address space applications: a case study with NWChem: PERFORMANCE CHARACTERIZATION OF GLOBAL ADDRESS SPACE APPLICATIONS
  • Hammond, Jeff R.; Krishnamoorthy, Sriram; Shende, Sameer
  • Concurrency and Computation: Practice and Experience, Vol. 24, Issue 2 https://doi.org/10.1002/cpe.1881
journal November 2011
Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures conference May 2015
Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations journal December 2008
Programmability of the HPCS Languages: A case study with a quantum chemistry kernel
  • Shet, Aniruddha G.; Elwasif, Wael R.; Harrison, Robert J.
  • Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536191
conference April 2008
Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates conference May 2015
UPC++: A PGAS Extension for C++
  • Zheng, Yili; Kamil, Amir; Driscoll, Michael B.
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.115
conference May 2014
A New Scalable Parallel Algorithm for Fock Matrix Construction
  • Liu, Xing; Patel, Aftab; Chow, Edmond
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.97
conference May 2014
Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms book January 2014
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations journal September 2010
Toward high-performance computational chemistry: II. A scalable self-consistent field program journal January 1996
A Local-View Array Library for Partitioned Global Address Space C++ Programs
  • Kamil, Amir; Zheng, Yili; Yelick, Katherine
  • Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY'14 https://doi.org/10.1145/2627373.2627378
conference January 2014
X10 as a Parallel Language for Scientific Computation: Practice and Experience
  • Milthorpe, Josh; Ganesh, V.; Rendell, Alistair P.
  • Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.103
conference May 2011
Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication
  • Dinan, James; Balaji, Pavan; Hammond, Jeff R.
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.72
conference May 2012
Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers
  • Arafat, Humayun; Sadayappan, P.; Dinan, James
  • 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.35
conference May 2012
Canonical purification of the density matrix in electronic-structure theory journal November 1998

Cited By (1)

Techniques for high-performance construction of Fock matrices journal January 2020