skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

Abstract

The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.

Authors:
 [1];  [2];  [3];  [4]
  1. Lomonosov Moscow State Univ., Moscow (Russian Federation)
  2. RSC Technologies, Moscow (Russian Federation)
  3. Intel Corporation, Schaumburg, IL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division; Intel Corporation
OSTI Identifier:
1401981
Alternate Identifier(s):
OSTI ID: 1402492
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 2017; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; GAMESS; Intel Xeon Phi; MPI; OpenMP; Parallel Hartree-Fock-Roothaan; integral computation; irregular computation; quantum chemistry

Citation Formats

Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States: N. p., 2017. Web. doi:10.1177/1094342017732628.
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, & Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States. https://doi.org/10.1177/1094342017732628
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. 2017. "An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture". United States. https://doi.org/10.1177/1094342017732628. https://www.osti.gov/servlets/purl/1401981.
@article{osti_1401981,
title = {An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture},
author = {Mironov, Vladimir and Moskovsky, Alexander and D’Mello, Michael and Alexeev, Yuri},
abstractNote = {The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.},
doi = {10.1177/1094342017732628},
url = {https://www.osti.gov/biblio/1401981}, journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = ,
volume = 2017,
place = {United States},
year = {2017},
month = {10}
}

Works referenced in this record:

Acceleration of the GAMESS-UK electronic structure package on graphical processing units
journal, May 2011


One- and two-electron integrals over cartesian gaussian functions
journal, February 1978


MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations
journal, March 2010


General atomic and molecular electronic structure system
journal, November 1993


Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008


Toward high-performance computational chemistry: I. Scalable Fock matrix construction algorithms
journal, January 1996


The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model
conference, May 2014


Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
conference, January 2015

  • Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
  • Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
  • https://doi.org/10.1145/2712386.2712391

Macroscale superlubricity enabled by graphene nanoscroll formation
journal, May 2015


Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method
journal, August 2007


Parallelization of SCF calculations within Q-Chem
journal, June 2000


Toward high-performance computational chemistry: II. A scalable self-consistent field program
journal, January 1996


A parallel distributed data CPHF algorithm for analytic Hessians
journal, January 2007


A New Scalable Parallel Algorithm for Fock Matrix Construction
conference, May 2014

  • Liu, Xing; Patel, Aftab; Chow, Edmond
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • https://doi.org/10.1109/IPDPS.2014.97

The Distributed Data Interface in GAMESS
journal, June 2000


Libcint: An efficient general integral library for Gaussian basis functions
journal, June 2015


Horizontal vectorization of electron repulsion integrals
journal, September 2016


Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009


New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock
journal, September 2012


NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010


Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
journal, April 1986


Evaluation of molecular integrals over Gaussian basis functions
journal, July 1976


The distributed data SCF
journal, February 2002


Works referencing / citing this record:

Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method
journal, March 2019