A Hartree-Fock Application Using UPC++ and the New DArray Library

Ozog, David; Kamil, Amir; Zheng, Yili; Hargrove, Paul; Hammond, Jeff R.; Malony, Allen; Jong, Wibe de; Yelick, Kathy

doi:10.1109/IPDPS.2016.108

Title: A Hartree-Fock Application Using UPC++ and the New DArray Library

Journal Article · Thu Jul 21 00:00:00 EDT 2016 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS)

DOI:https://doi.org/10.1109/IPDPS.2016.108· OSTI ID:1379522

Ozog, David ^[1]; Kamil, Amir ^[2]; Zheng, Yili ^[2]; Hargrove, Paul ^[2]; Hammond, Jeff R. ^[3]; Malony, Allen ^[1]; Jong, Wibe de ^[2]; Yelick, Kathy ^[2]

Univ. of Oregon, Eugene, OR (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Intel Corporation, Portland, OR (United States)

The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming models and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-05CH11231; SC0006723; SC0012381; SC0005360

OSTI ID:: 1379522

Journal Information:: Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS), Conference: 2016 IEEE 30th International Parallel and Distributed Processing Symposium, Chicago, IL (United States), 23-27 May 2016; ISSN 1530-2075

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 2 works

Citation information provided by
Web of Science

References (19)

Heuristic static load-balancing algorithm applied to the fragment molecular orbital method Alexeev, Yuri; Mahajan, Ashutosh; Leyffer, Sven 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.62	conference	November 2012
A framework for load balancing of tensor contraction expressions via dynamic task partitioning Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13 https://doi.org/10.1145/2503210.2503290	conference	January 2013
Parallel Computing in Quantum Chemistry Janssen, Curtis L.; Nielsen, Ida M. B. CRC Press https://doi.org/10.1201/9781420051650	reference-book	January 2008
Designing scalable PGAS communication subsystems on cray gemini interconnect Vishnu, Abhinav; Daily, Jeff; Palmer, Bruce 2012 19th International Conference on High Performance Computing (HiPC) https://doi.org/10.1109/HiPC.2012.6507506	conference	December 2012
Performance characterization of global address space applications: a case study with NWChem: PERFORMANCE CHARACTERIZATION OF GLOBAL ADDRESS SPACE APPLICATIONS Hammond, Jeff R.; Krishnamoorthy, Sriram; Shende, Sameer Concurrency and Computation: Practice and Experience, Vol. 24, Issue 2 https://doi.org/10.1002/cpe.1881	journal	November 2011
Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures Si, Min; Pena, Antonio J.; Hammond, Jeff 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2015.35	conference	May 2015
Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations Flocke, N.; Lotrich, V. Journal of Computational Chemistry, Vol. 29, Issue 16 https://doi.org/10.1002/jcc.21018	journal	December 2008
Programmability of the HPCS Languages: A case study with a quantum chemistry kernel Shet, Aniruddha G.; Elwasif, Wael R.; Harrison, Robert J. Distributed Processing Symposium (IPDPS), 2008 IEEE International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2008.4536191	conference	April 2008
Parallel Hessian Assembly for Seismic Waveform Inversion Using Global Updates French, Scott; Zheng, Yili; Romanowicz, Barbara 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2015.58	conference	May 2015
UPC++: A PGAS Extension for C++ Zheng, Yili; Kamil, Amir; Driscoll, Michael B. 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.115	conference	May 2014
A New Scalable Parallel Algorithm for Fock Matrix Construction Liu, Xing; Patel, Aftab; Chow, Edmond 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.97	conference	May 2014
Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms Shan, Hongzhang; Austin, Brian; De Jong, Wibe Lecture Notes in Computer Science https://doi.org/10.1007/978-3-319-10214-6_13	book	January 2014
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations Valiev, M.; Bylaska, E. J.; Govind, N. Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489 https://doi.org/10.1016/j.cpc.2010.04.018	journal	September 2010
Toward high-performance computational chemistry: II. A scalable self-consistent field program Harrison, Robert J.; Guest, Martyn F.; Kendall, Rick A. Journal of Computational Chemistry, Vol. 17, Issue 1 https://doi.org/10.1002/(SICI)1096-987X(19960115)17:1<124::AID-JCC10>3.0.CO;2-N	journal	January 1996
A Local-View Array Library for Partitioned Global Address Space C++ Programs Kamil, Amir; Zheng, Yili; Yelick, Katherine Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY'14 https://doi.org/10.1145/2627373.2627378	conference	January 2014
X10 as a Parallel Language for Scientific Computation: Practice and Experience Milthorpe, Josh; Ganesh, V.; Rendell, Alistair P. Distributed Processing Symposium (IPDPS), 2011 IEEE International Parallel & Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2011.103	conference	May 2011
Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication Dinan, James; Balaji, Pavan; Hammond, Jeff R. 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.72	conference	May 2012
Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers Arafat, Humayun; Sadayappan, P.; Dinan, James 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.35	conference	May 2012
Canonical purification of the density matrix in electronic-structure theory Palser, Adam H. R.; Manolopoulos, David E. Physical Review B, Vol. 58, Issue 19 https://doi.org/10.1103/PhysRevB.58.12704	journal	November 1998

Cited By (1)

Techniques for high-performance construction of Fock matrices Huang, Hua; Sherrill, C. David; Chow, Edmond The Journal of Chemical Physics, Vol. 152, Issue 2 https://doi.org/10.1063/1.5129452	journal	January 2020

Similar Records

UPC++: A PGAS extension for C++

Conference · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:1379522

Zheng, Y; Kamil, A; Driscoll, MB; +2 more

UPC++

Software · Thu May 01 00:00:00 EDT 2014 · OSTI ID:1379522

Amir Kamil, Yili Zheng

Graph Algorithms in PGAS: Chapel and UPC++

Conference · Wed Sep 25 00:00:00 EDT 2019 · OSTI ID:1379522

Jenkins, Louis; Firoz, Jesun S.; Zalewski, Marcin J.; +2 more

Related Subjects

97 MATHEMATICS AND COMPUTING
Arrays
Libraries
Electronics packaging
Hafnium
Computational modeling
Programming
attentiveness
Hartree-Fock
self-consistent field (SCF)
quantum chemistry
PGAS
UPC/UPC++
Global Arrays
performance analysis
load balancing
work stealing

Title: A Hartree-Fock Application Using UPC++ and the New DArray Library

Citation Formats

References (19)

Cited By (1)

Similar Records

Related Subjects