DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

Abstract

The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.

Authors:
 [1];  [2];  [3];  [4]
  1. Lomonosov Moscow State Univ., Moscow (Russian Federation)
  2. RSC Technologies, Moscow (Russian Federation)
  3. Intel Corporation, Schaumburg, IL (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division; Intel Corporation
OSTI Identifier:
1401981
Alternate Identifier(s):
OSTI ID: 1402492
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 2017; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; GAMESS; Intel Xeon Phi; MPI; OpenMP; Parallel Hartree-Fock-Roothaan; integral computation; irregular computation; quantum chemistry

Citation Formats

Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States: N. p., 2017. Web. doi:10.1177/1094342017732628.
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, & Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture. United States. https://doi.org/10.1177/1094342017732628
Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. Wed . "An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture". United States. https://doi.org/10.1177/1094342017732628. https://www.osti.gov/servlets/purl/1401981.
@article{osti_1401981,
title = {An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture},
author = {Mironov, Vladimir and Moskovsky, Alexander and D’Mello, Michael and Alexeev, Yuri},
abstractNote = {The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.},
doi = {10.1177/1094342017732628},
journal = {International Journal of High Performance Computing Applications},
number = ,
volume = 2017,
place = {United States},
year = {Wed Oct 04 00:00:00 EDT 2017},
month = {Wed Oct 04 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 12 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Acceleration of the GAMESS-UK electronic structure package on graphical processing units
journal, May 2011

  • Wilkinson, Karl A.; Sherwood, Paul; Guest, Martyn F.
  • Journal of Computational Chemistry, Vol. 32, Issue 10
  • DOI: 10.1002/jcc.21815

Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations
journal, December 2008

  • Flocke, N.; Lotrich, V.
  • Journal of Computational Chemistry, Vol. 29, Issue 16
  • DOI: 10.1002/jcc.21018

One- and two-electron integrals over cartesian gaussian functions
journal, February 1978


MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations
journal, March 2010

  • Ishimura, Kazuya; Kuramoto, Kei; Ikuta, Yasuhiro
  • Journal of Chemical Theory and Computation, Vol. 6, Issue 4
  • DOI: 10.1021/ct100083w

The Performance Characterization of the RSC PetaStream Module
book, January 2014


General atomic and molecular electronic structure system
journal, November 1993

  • Schmidt, Michael W.; Baldridge, Kim K.; Boatz, Jerry A.
  • Journal of Computational Chemistry, Vol. 14, Issue 11, p. 1347-1363
  • DOI: 10.1002/jcc.540141112

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008

  • Ufimtsev, Ivan S.; Martínez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 4, Issue 2
  • DOI: 10.1021/ct700268q

The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model
conference, May 2014

  • Alexeev, Yuri; Mickelson, Sheri; Leyffer, Sven
  • 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
  • DOI: 10.1109/IPDPSW.2014.177

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
conference, January 2015

  • Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
  • Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
  • DOI: 10.1145/2712386.2712391

Macroscale superlubricity enabled by graphene nanoscroll formation
journal, May 2015

  • Berman, D.; Deshmukh, S. A.; Sankaranarayanan, S. K. R. S.
  • Science, Vol. 348, Issue 6239
  • DOI: 10.1126/science.1262024

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms
book, January 2014


Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method
journal, August 2007

  • Fedorov, Dmitri G.; Kitaura, Kazuo
  • The Journal of Physical Chemistry A, Vol. 111, Issue 30
  • DOI: 10.1021/jp0716740

Parallelization of SCF calculations within Q-Chem
journal, June 2000


Toward high-performance computational chemistry: II. A scalable self-consistent field program
journal, January 1996


A parallel distributed data CPHF algorithm for analytic Hessians
journal, January 2007

  • Alexeev, Yuri; Schmidt, Michael W.; Windus, Theresa L.
  • Journal of Computational Chemistry, Vol. 28, Issue 10
  • DOI: 10.1002/jcc.20633

A New Scalable Parallel Algorithm for Fock Matrix Construction
conference, May 2014

  • Liu, Xing; Patel, Aftab; Chow, Edmond
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • DOI: 10.1109/IPDPS.2014.97

Libcint: An efficient general integral library for Gaussian basis functions
journal, June 2015

  • Sun, Qiming
  • Journal of Computational Chemistry, Vol. 36, Issue 22
  • DOI: 10.1002/jcc.23981

Horizontal vectorization of electron repulsion integrals
journal, September 2016

  • Pritchard, Benjamin P.; Chow, Edmond
  • Journal of Computational Chemistry, Vol. 37, Issue 28
  • DOI: 10.1002/jcc.24483

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009

  • Ufimtsev, Ivan S.; Martinez, Todd J.
  • Journal of Chemical Theory and Computation, Vol. 5, Issue 4
  • DOI: 10.1021/ct800526s

New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock
journal, September 2012

  • Asadchev, Andrey; Gordon, Mark S.
  • Journal of Chemical Theory and Computation, Vol. 8, Issue 11
  • DOI: 10.1021/ct300526w

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

  • Valiev, M.; Bylaska, E. J.; Govind, N.
  • Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
  • DOI: 10.1016/j.cpc.2010.04.018

Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
journal, April 1986

  • Obara, S.; Saika, A.
  • The Journal of Chemical Physics, Vol. 84, Issue 7
  • DOI: 10.1063/1.450106

Evaluation of molecular integrals over Gaussian basis functions
journal, July 1976

  • Dupuis, Michel; Rys, John; King, Harry F.
  • The Journal of Chemical Physics, Vol. 65, Issue 1
  • DOI: 10.1063/1.432807

Works referencing / citing this record:

Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method
journal, March 2019

  • Mironov, Vladimir; Alexeev, Yuri; Fedorov, Dmitri G.
  • International Journal of Quantum Chemistry, Vol. 119, Issue 12
  • DOI: 10.1002/qua.25937