An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; Alexeev, Yuri

doi:10.1177/1094342017732628

Title: An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

Abstract

The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.

Authors:

Mironov, Vladimir ^[1]; Moskovsky, Alexander ^[2]; D’Mello, Michael ^[3]; Alexeev, Yuri ^[4]

Lomonosov Moscow State Univ., Moscow (Russian Federation)
RSC Technologies, Moscow (Russian Federation)
Intel Corporation, Schaumburg, IL (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)

Publication Date:: Wed Oct 04 00:00:00 EDT 2017

Research Org.:: Argonne National Laboratory (ANL), Argonne, IL (United States)

Sponsoring Org.:: USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division; Intel Corporation

OSTI Identifier:: 1401981

Alternate Identifier(s):: OSTI ID: 1402492

Grant/Contract Number:: AC02-06CH11357

Resource Type:: Accepted Manuscript

Journal Name:: International Journal of High Performance Computing Applications

Additional Journal Information:: Journal Volume: 2017; Journal ID: ISSN 1094-3420

Publisher:: SAGE

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; GAMESS; Intel Xeon Phi; MPI; OpenMP; Parallel Hartree-Fock-Roothaan; integral computation; irregular computation; quantum chemistry

Citation Formats


                    Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture.  United States: N. p., 2017. 
Web.  doi:10.1177/1094342017732628.

Copy to clipboard


                    Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, & Alexeev, Yuri. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture.  United States.  https://doi.org/10.1177/1094342017732628

Copy to clipboard


                    Mironov, Vladimir, Moskovsky, Alexander, D’Mello, Michael, and Alexeev, Yuri. Wed .  
"An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture".  United States.  https://doi.org/10.1177/1094342017732628.  https://www.osti.gov/servlets/purl/1401981.

Copy to clipboard


                    
@article{osti_1401981,

  title        = {An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture},

  author       = {Mironov, Vladimir and Moskovsky, Alexander and D’Mello, Michael and Alexeev, Yuri},

  abstractNote = {The Hartree–Fock method in the General Atomic and Molecular Structure System (GAMESS) quantum chemistry package represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals and the building of the Fock matrix. These are the central components of the main self consistent field (SCF) loop, the key hot spot in electronic structure codes. By threading the Message Passing Interface (MPI) ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4× to 6× for large systems) but also achieve a significant (>2>2×) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel® Xeon Phi™ supercomputer. Scaling numbers are reported on up to 7680 cores on Intel Xeon Phi coprocessors.},

  doi          = {10.1177/1094342017732628},

  journal      = {International Journal of High Performance Computing Applications},

  number       = ,

  volume       = 2017,

  place        = {United States},

  year         = {Wed Oct 04 00:00:00 EDT 2017},

  month        = {Wed Oct 04 00:00:00 EDT 2017}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1177/1094342017732628

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 12 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Acceleration of the GAMESS-UK electronic structure package on graphical processing units
journal, May 2011

Wilkinson, Karl A.; Sherwood, Paul; Guest, Martyn F.
Journal of Computational Chemistry, Vol. 32, Issue 10
DOI: 10.1002/jcc.21815

Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations
journal, December 2008

Flocke, N.; Lotrich, V.
Journal of Computational Chemistry, Vol. 29, Issue 16
DOI: 10.1002/jcc.21018

One- and two-electron integrals over cartesian gaussian functions
journal, February 1978

McMurchie, Larry E.; Davidson, Ernest R.
Journal of Computational Physics, Vol. 26, Issue 2
DOI: 10.1016/0021-9991(78)90092-X

MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations
journal, March 2010

Ishimura, Kazuya; Kuramoto, Kei; Ikuta, Yasuhiro
Journal of Chemical Theory and Computation, Vol. 6, Issue 4
DOI: 10.1021/ct100083w

The Performance Characterization of the RSC PetaStream Module
book, January 2014

Semin, Andrey; Druzhinin, Egor; Mironov, Vladimir
Lecture Notes in Computer Science
DOI: 10.1007/978-3-319-07518-1_27

General atomic and molecular electronic structure system
journal, November 1993

Schmidt, Michael W.; Baldridge, Kim K.; Boatz, Jerry A.
Journal of Computational Chemistry, Vol. 14, Issue 11, p. 1347-1363
DOI: 10.1002/jcc.540141112

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008

Ufimtsev, Ivan S.; Martínez, Todd J.
Journal of Chemical Theory and Computation, Vol. 4, Issue 2
DOI: 10.1021/ct700268q

The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model
conference, May 2014

Alexeev, Yuri; Mickelson, Sheri; Leyffer, Sven
2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
DOI: 10.1109/IPDPSW.2014.177

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
conference, January 2015

Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15
DOI: 10.1145/2712386.2712391

Macroscale superlubricity enabled by graphene nanoscroll formation
journal, May 2015

Berman, D.; Deshmukh, S. A.; Sankaranarayanan, S. K. R. S.
Science, Vol. 348, Issue 6239
DOI: 10.1126/science.1262024

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms
book, January 2014

Shan, Hongzhang; Austin, Brian; De Jong, Wibe
Lecture Notes in Computer Science
DOI: 10.1007/978-3-319-10214-6_13

Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method
journal, August 2007

Fedorov, Dmitri G.; Kitaura, Kazuo
The Journal of Physical Chemistry A, Vol. 111, Issue 30
DOI: 10.1021/jp0716740

Parallelization of SCF calculations within Q-Chem
journal, June 2000

Furlani, Thomas R.; Kong, Jing; Gill, Peter M. W.
Computer Physics Communications, Vol. 128, Issue 1-2
DOI: 10.1016/S0010-4655(00)00059-X

Toward high-performance computational chemistry: II. A scalable self-consistent field program
journal, January 1996

Harrison, Robert J.; Guest, Martyn F.; Kendall, Rick A.
Journal of Computational Chemistry, Vol. 17, Issue 1
DOI: 10.1002/(SICI)1096-987X(19960115)17:1<124::AID-JCC10>3.0.CO;2-N

A parallel distributed data CPHF algorithm for analytic Hessians
journal, January 2007

Alexeev, Yuri; Schmidt, Michael W.; Windus, Theresa L.
Journal of Computational Chemistry, Vol. 28, Issue 10
DOI: 10.1002/jcc.20633

A New Scalable Parallel Algorithm for Fock Matrix Construction
conference, May 2014

Liu, Xing; Patel, Aftab; Chow, Edmond
2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium
DOI: 10.1109/IPDPS.2014.97

Libcint: An efficient general integral library for Gaussian basis functions
journal, June 2015

Sun, Qiming
Journal of Computational Chemistry, Vol. 36, Issue 22
DOI: 10.1002/jcc.23981

Horizontal vectorization of electron repulsion integrals
journal, September 2016

Pritchard, Benjamin P.; Chow, Edmond
Journal of Computational Chemistry, Vol. 37, Issue 28
DOI: 10.1002/jcc.24483

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009

Ufimtsev, Ivan S.; Martinez, Todd J.
Journal of Chemical Theory and Computation, Vol. 5, Issue 4
DOI: 10.1021/ct800526s

New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock
journal, September 2012

Asadchev, Andrey; Gordon, Mark S.
Journal of Chemical Theory and Computation, Vol. 8, Issue 11
DOI: 10.1021/ct300526w

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

Valiev, M.; Bylaska, E. J.; Govind, N.
Computer Physics Communications, Vol. 181, Issue 9, p. 1477-1489
DOI: 10.1016/j.cpc.2010.04.018

Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
journal, April 1986

Obara, S.; Saika, A.
The Journal of Chemical Physics, Vol. 84, Issue 7
DOI: 10.1063/1.450106

Evaluation of molecular integrals over Gaussian basis functions
journal, July 1976

Dupuis, Michel; Rys, John; King, Harry F.
The Journal of Chemical Physics, Vol. 65, Issue 1
DOI: 10.1063/1.432807

Works referencing / citing this record:

Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method
journal, March 2019

Mironov, Vladimir; Alexeev, Yuri; Fedorov, Dmitri G.
International Journal of Quantum Chemistry, Vol. 119, Issue 12
DOI: 10.1002/qua.25937

Similar Records in DOE PAGES and OSTI.GOV collections:

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor

Conference Bylaska, Eric J. ; Jacquelin, Mathias ; De Jong, Wibe A. ; ...

Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore »« less
https://doi.org/10.1007/978-3-319-67630-2_30
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

Conference Koskela, Tuomas S. ; Lobet, Mathieu ; Deslippe, Jack ; ...

In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore »« less
Full Text Available
Efficient Implementation of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor

Conference Apra, Edoardo ; Klemm, Michael ; Kowalski, Karol

This paper presents the implementation and performance of the highly accurate CCSD(T) quantum chemistry method on the Intel Xeon Phi coprocessor within the context of the NWChem computational chemistry package. The widespread use of highly correlated methods in electronic structure calculations is contingent upon the interplay between advances in theory and the possibility of utilizing the ever-growing computer power of emerging heterogeneous architectures. We discuss the design decisions of our implementation as well as the optimizations applied to the compute kernels and data transfers between host and coprocessor. We show the feasibility of adopting the Intel Many Integrated Core Architecturemore »« less
https://doi.org/10.1109/SC.2014.60
TH-A-19A-08: Intel Xeon Phi Implementation of a Fast Multi-Purpose Monte Carlo Simulation for Proton Therapy

Journal Article Souris, K ; Lee, J ; Sterpin, E - Medical Physics

Purpose: Recent studies have demonstrated the capability of graphics processing units (GPUs) to compute dose distributions using Monte Carlo (MC) methods within clinical time constraints. However, GPUs have a rigid vectorial architecture that favors the implementation of simplified particle transport algorithms, adapted to specific tasks. Our new, fast, and multipurpose MC code, named MCsquare, runs on Intel Xeon Phi coprocessors. This technology offers 60 independent cores, and therefore more flexibility to implement fast and yet generic MC functionalities, such as prompt gamma simulations. Methods: MCsquare implements several models and hence allows users to make their own tradeoff between speed andmore »« less
https://doi.org/10.1118/1.4889541
Trinity Benchmarks on the Intel Xeon Phi (Knights Corner)

Technical Report Rajan, Mahesh ; Doerfler, Douglas W. ; Hammond, Simon David

This report documents the early experiences with porting and performance analysis of the Tri-Lab Trinity benchmark applications on Intel Xeon Phi (Knights Corner) (KNC) processor. KNC, the second generation of the Intel Many Integrated Core (MIC) architectures, uses a large number of small P54C-x86 cores with wide vector units and is deployed as PCI bus attached process accelerators. Sandia has experimental test beds of small InifiniBand clusters and workstations to investigate the performance of the MIC architecture. On these experimental test beds the programming models that may be investigated are "offload", "symmetric" and "native". Among these program usage models ourmore »« less
https://doi.org/10.2172/1504115

Full Text Available

Similar Records

Title: An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

Abstract

Citation Formats

Acceleration of the GAMESS-UK electronic structure package on graphical processing units journal, May 2011

Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations journal, December 2008

One- and two-electron integrals over cartesian gaussian functions journal, February 1978

MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations journal, March 2010

The Performance Characterization of the RSC PetaStream Module book, January 2014

General atomic and molecular electronic structure system journal, November 1993

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation journal, January 2008

The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model conference, May 2014

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture conference, January 2015

Macroscale superlubricity enabled by graphene nanoscroll formation journal, May 2015

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms book, January 2014

Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method journal, August 2007

Parallelization of SCF calculations within Q-Chem journal, June 2000

Toward high-performance computational chemistry: II. A scalable self-consistent field program journal, January 1996

A parallel distributed data CPHF algorithm for analytic Hessians journal, January 2007

A New Scalable Parallel Algorithm for Fock Matrix Construction conference, May 2014

Libcint: An efficient general integral library for Gaussian basis functions journal, June 2015

Horizontal vectorization of electron repulsion integrals journal, September 2016

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation journal, March 2009

New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock journal, September 2012

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations journal, September 2010

Efficient recursive computation of molecular integrals over Cartesian Gaussian functions journal, April 1986

Evaluation of molecular integrals over Gaussian basis functions journal, July 1976

Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method journal, March 2019

Acceleration of the GAMESS-UK electronic structure package on graphical processing units
journal, May 2011

Efficient electronic integrals and their generalized derivatives for object oriented implementations of electronic structure calculations
journal, December 2008

One- and two-electron integrals over cartesian gaussian functions
journal, February 1978

MPI/OpenMP Hybrid Parallel Algorithm for Hartree−Fock Calculations
journal, March 2010

The Performance Characterization of the RSC PetaStream Module
book, January 2014

General atomic and molecular electronic structure system
journal, November 1993

Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation
journal, January 2008

The Heuristic Static Load-Balancing Algorithm Applied to the Community Earth System Model
conference, May 2014

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
conference, January 2015

Macroscale superlubricity enabled by graphene nanoscroll formation
journal, May 2015

Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms
book, January 2014

Extending the Power of Quantum Chemistry to Large Systems with the Fragment Molecular Orbital Method
journal, August 2007

Parallelization of SCF calculations within Q-Chem
journal, June 2000

Toward high-performance computational chemistry: II. A scalable self-consistent field program
journal, January 1996

A parallel distributed data CPHF algorithm for analytic Hessians
journal, January 2007

A New Scalable Parallel Algorithm for Fock Matrix Construction
conference, May 2014

Libcint: An efficient general integral library for Gaussian basis functions
journal, June 2015

Horizontal vectorization of electron repulsion integrals
journal, September 2016

Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation
journal, March 2009

New Multithreaded Hybrid CPU/GPU Approach to Hartree–Fock
journal, September 2012

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
journal, September 2010

Efficient recursive computation of molecular integrals over Cartesian Gaussian functions
journal, April 1986

Evaluation of molecular integrals over Gaussian basis functions
journal, July 1976

Multithreaded parallelization of the energy and analytic gradient in the fragment molecular orbital method
journal, March 2019