An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture
Abstract
The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.
- Authors:
-
- Lomonosov Moscow State Univ., Moscow (Russian Federation)
- RSC Technologies, Moscow (Russian Federation)
- Intel Corporation, Schaumburg, IL (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division; Intel Corporation
- OSTI Identifier:
- 1401981
- Alternate Identifier(s):
- OSTI ID: 1402492
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Accepted Manuscript
- Journal Name:
- International Journal of High Performance Computing Applications
- Additional Journal Information:
- Journal Volume: 2017; Journal ID: ISSN 1094-3420
- Publisher:
- SAGE
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; GAMESS; Intel Xeon Phi; MPI; OpenMP; Parallel Hartree-Fock-Roothaan; integral computation; irregular computation; quantum chemistry
Citation Formats
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States: N. p., 2017.
Web. doi:10.1177/1094342017732628.
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, & Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States. https://doi.org/10.1177/1094342017732628
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. Wed .
"An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture". United States. https://doi.org/10.1177/1094342017732628. https://www.osti.gov/servlets/purl/1401981.
@article{osti_1401981,
title = {An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture},
author = {Mironov, Vladimir and Moskovsky, Alexander and D’Mello, Michael and Alexeev, Yuri},
abstractNote = {The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.},
doi = {10.1177/1094342017732628},
journal = {International Journal of High Performance Computing Applications},
number = ,
volume = 2017,
place = {United States},
year = {2017},
month = {10}
}
Works referenced in this record:
Acceleration of the GAMESS-UK electronic structure package on graphical processing units
journal, May 2011
- Wilkinson, Karl A.; Sherwood, Paul; Guest, Martyn F.
- Journal of Computational Chemistry, Vol. 32, Issue 10
Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations
journal, December 2008
- Flocke, N.; Lotrich, V.
- Journal of Computational Chemistry, Vol. 29, Issue 16
One- and two-electron integrals over cartesian gaussian functions
journal, February 1978
- McMurchie, Larry E.; Davidson, Ernest R.
- Journal of Computational Physics, Vol. 26, Issue 2
MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations
journal, March 2010
- Ishimura, Kazuya; Kuramoto, Kei; Ikuta, Yasuhiro
- Journal of Chemical Theory and Computation, Vol. 6, Issue 4
General atomic and molecular electronic structure system
journal, November 1993
- Schmidt, Michael W.; Baldridge, Kim K.; Boatz, Jerry A.
- Journal of Computational Chemistry, Vol. 14, Issue 11, p. 1347-1363
Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008
- Ufimtsev, Ivan S.; Martínez, Todd J.
- Journal of Chemical Theory and Computation, Vol. 4, Issue 2
Toward high-performance computational chemistry: I. Scalable Fock matrix construction algorithms
journal, January 1996
- Foster, Ian T.; Tilson, Jeffrey L.; Wagner, Albert F.
- Journal of Computational Chemistry, Vol. 17, Issue 1
The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model
conference, May 2014
- Alexeev, Yuri; Mickelson, Sheri; Leyffer, Sven
- 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
conference, January 2015
- Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
- Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
Macroscale superlubricity enabled by graphene nanoscroll formation
journal, May 2015
- Berman, D.; Deshmukh, S. A.; Sankaranarayanan, S. K. R. S.
- Science, Vol. 348, Issue 6239
Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method
journal, August 2007
- Fedorov, Dmitri G.; Kitaura, Kazuo
- The Journal of Physical Chemistry A, Vol. 111, Issue 30
Parallelization of SCF calculations within Q-Chem
journal, June 2000
- Furlani, Thomas R.; Kong, Jing; Gill, Peter M. W.
- Computer Physics Communications, Vol. 128, Issue 1-2
Toward high-performance computational chemistry: II. A scalable self-consistent field program
journal, January 1996
- Harrison, Robert J.; Guest, Martyn F.; Kendall, Rick A.
- Journal of Computational Chemistry, Vol. 17, Issue 1
A parallel distributed data CPHF algorithm for analytic Hessians
journal, January 2007
- Alexeev, Yuri; Schmidt, Michael W.; Windus, Theresa L.
- Journal of Computational Chemistry, Vol. 28, Issue 10
A New Scalable Parallel Algorithm for Fock Matrix Construction
conference, May 2014
- Liu, Xing; Patel, Aftab; Chow, Edmond
- 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
The Distributed Data Interface in GAMESS
journal, June 2000
- Fletcher, Graham D.; Schmidt, Michael W.; Bode, Brett M.
- Computer Physics Communications, Vol. 128, Issue 1-2
Libcint: An efficient general integral library for Gaussian basis functions
journal, June 2015
- Sun, Qiming
- Journal of Computational Chemistry, Vol. 36, Issue 22
Horizontal vectorization of electron repulsion integrals
journal, September 2016
- Pritchard, Benjamin P.; Chow, Edmond
- Journal of Computational Chemistry, Vol. 37, Issue 28
Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009
- Ufimtsev, Ivan S.; Martinez, Todd J.
- Journal of Chemical Theory and Computation, Vol. 5, Issue 4
New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock
journal, September 2012
- Asadchev, Andrey; Gordon, Mark S.
- Journal of Chemical Theory and Computation, Vol. 8, Issue 11
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010
- Valiev, M.; Bylaska, E. J.; Govind, N.
- Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
journal, April 1986
- Obara, S.; Saika, A.
- The Journal of Chemical Physics, Vol. 84, Issue 7
Evaluation of molecular integrals over Gaussian basis functions
journal, July 1976
- Dupuis, Michel; Rys, John; King, Harry F.
- The Journal of Chemical Physics, Vol. 65, Issue 1
The distributed data SCF
journal, February 2002
- Alexeev, Yuri; Kendall, Ricky A.; Gordon, Mark S.
- Computer Physics Communications, Vol. 143, Issue 1
Works referencing / citing this record:
Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method
journal, March 2019
- Mironov, Vladimir; Alexeev, Yuri; Fedorov, Dmitri G.
- International Journal of Quantum Chemistry, Vol. 119, Issue 12