Optimizing legacy molecular dynamics software with directive-based offload

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.

doi:10.1016/j.cpc.2015.05.004

Title: Optimizing legacy molecular dynamics software with directive-based offload

Abstract

The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.

Authors:

Michael Brown, W. ^[1];

^[2]; Gavhane, Nitin ^[3]; Thakkar, Foram M. ^[3]; Plimpton, Steven J. ^[4]

Intel Corporation, Portland, OR (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Shell India Markets Private Limited, Bangalore (India)
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Publication Date:: Thu May 14 00:00:00 EDT 2015

Research Org.:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Org.:: USDOE Office of Science (SC)

OSTI Identifier:: 1261448

Alternate Identifier(s):: OSTI ID: 1246679

Grant/Contract Number:: AC05-00OR22725; AC04-94AL85000

Resource Type:: Accepted Manuscript

Journal Name:: Computer Physics Communications

Additional Journal Information:: Journal Volume: 195; Journal Issue: C; Journal ID: ISSN 0010-4655

Publisher:: Elsevier

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING; Molecular dynamics; Xeon Phi; GPU; Coprocessor; Accelerator; Many-core; PERFORMANCE; POTENTIALS; MORPHOLOGY

Citation Formats


                    Michael Brown, W., Carrillo, Jan-Michael Y., Gavhane, Nitin, Thakkar, Foram M., and Plimpton, Steven J. Optimizing legacy molecular dynamics software with directive-based offload.  United States: N. p., 2015. 
Web.  doi:10.1016/j.cpc.2015.05.004.

Copy to clipboard


                    Michael Brown, W., Carrillo, Jan-Michael Y., Gavhane, Nitin, Thakkar, Foram M., & Plimpton, Steven J. Optimizing legacy molecular dynamics software with directive-based offload.  United States.  https://doi.org/10.1016/j.cpc.2015.05.004

Copy to clipboard


                    Michael Brown, W., Carrillo, Jan-Michael Y., Gavhane, Nitin, Thakkar, Foram M., and Plimpton, Steven J. Thu .  
"Optimizing legacy molecular dynamics software with directive-based offload".  United States.  https://doi.org/10.1016/j.cpc.2015.05.004.  https://www.osti.gov/servlets/purl/1261448.

Copy to clipboard


                    
@article{osti_1261448,

  title        = {Optimizing legacy molecular dynamics software with directive-based offload},

  author       = {Michael Brown, W. and Carrillo, Jan-Michael Y. and Gavhane, Nitin and Thakkar, Foram M. and Plimpton, Steven J.},

  abstractNote = {The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.},

  doi          = {10.1016/j.cpc.2015.05.004},

  journal      = {Computer Physics Communications},

  number       = C,

  volume       = 195,

  place        = {United States},

  year         = {Thu May 14 00:00:00 EDT 2015},

  month        = {Thu May 14 00:00:00 EDT 2015}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1016/j.cpc.2015.05.004

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 26 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Implementing molecular dynamics on hybrid high performance computers – short range forces
journal, April 2011

Brown, W. Michael; Wang, Peng; Plimpton, Steven J.
Computer Physics Communications, Vol. 182, Issue 4
DOI: 10.1016/j.cpc.2010.12.021

Implementing molecular dynamics on hybrid high performance computers – Particle–particle particle-mesh
journal, March 2012

Brown, W. Michael; Kohlmeyer, Axel; Plimpton, Steven J.
Computer Physics Communications, Vol. 183, Issue 3
DOI: 10.1016/j.cpc.2011.10.012

An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer
journal, January 2012

Michael Brown, W.; Nguyen, Trung D.; Fuentes-Cabrera, Miguel
Procedia Computer Science, Vol. 9
DOI: 10.1016/j.procs.2012.04.020

Implementing molecular dynamics on hybrid high performance computers—Three-body potentials
journal, December 2013

Brown, W. Michael; Yamada, Masako
Computer Physics Communications, Vol. 184, Issue 12
DOI: 10.1016/j.cpc.2013.08.002

Fast Parallel Algorithms for Short-Range Molecular Dynamics
journal, March 1995

Plimpton, Steve
Journal of Computational Physics, Vol. 117, Issue 1
DOI: 10.1006/jcph.1995.1039

A flexible algorithm for calculating pair interactions on SIMD architectures
journal, December 2013

Páll, Szilárd; Hess, Berk
Computer Physics Communications, Vol. 184, Issue 12
DOI: 10.1016/j.cpc.2013.06.003

A generalized Gay-Berne intermolecular potential for biaxial particles
journal, April 1995

Berardi, R.; Fava, C.; Zannoni, C.
Chemical Physics Letters, Vol. 236, Issue 4-5
DOI: 10.1016/0009-2614(95)00212-M

All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins ^†
journal, April 1998

MacKerell, A. D.; Bashford, D.; Bellott, M.
The Journal of Physical Chemistry B, Vol. 102, Issue 18
DOI: 10.1021/jp973084f

Quiet high-resolution computer models of a plasma
journal, February 1974

Hockney, R. W.; Goel, S. P.; Eastwood, J. W.
Journal of Computational Physics, Vol. 14, Issue 2
DOI: 10.1016/0021-9991(74)90010-2

Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes
journal, March 1977

Ryckaert, Jean-Paul; Ciccotti, Giovanni; Berendsen, Herman J. C.
Journal of Computational Physics, Vol. 23, Issue 3
DOI: 10.1016/0021-9991(77)90098-5

Molecular Dynamics Simulation of Dark-adapted Rhodopsin in an Explicit Membrane Bilayer: Coupling between Local Retinal and Larger Scale Conformational Change
journal, October 2003

Crozier, Paul S.; Stevens, Mark J.; Forrest, Lucy R.
Journal of Molecular Biology, Vol. 333, Issue 3
DOI: 10.1016/j.jmb.2003.08.045

On the morphology of polymer-based photovoltaics
journal, March 2012

Liu, Feng; Gu, Yu; Jung, Jae Woong
Journal of Polymer Science Part B: Polymer Physics, Vol. 50, Issue 15
DOI: 10.1002/polb.23063

Development and testing of a general amber force field
journal, January 2004

Wang, Junmei; Wolf, Romain M.; Caldwell, James W.
Journal of Computational Chemistry, Vol. 25, Issue 9
DOI: 10.1002/jcc.20035

Modelling of P3HT:PCBM interface using coarse-grained forcefield derived from accurate atomistic forcefield
journal, January 2014

To, T. T.; Adams, S.
Physical Chemistry Chemical Physics, Vol. 16, Issue 10
DOI: 10.1039/c3cp54308k

Liquid crystal nanodroplets in solution
journal, January 2009

Brown, W. Michael; Petersen, Matt K.; Plimpton, Steven J.
The Journal of Chemical Physics, Vol. 130, Issue 4
DOI: 10.1063/1.3058435

Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations
journal, January 2014

Nguyen, Trung Dac; Carrillo, Jan-Michael Y.; Matheson, Michael A.
Nanoscale, Vol. 6, Issue 6
DOI: 10.1039/C3NR05413F

Computational aspects of many-body potentials
journal, May 2012

Plimpton, Steven J.; Thompson, Aidan P.
MRS Bulletin, Vol. 37, Issue 5
DOI: 10.1557/mrs.2012.96

Works referencing / citing this record:

Design considerations for GPU-aware collective communications in MPI: Design considerations for GPU-Aware collective communications in MPI
journal, May 2018

Faraji, Iman; Afsahi, Ahmad
Concurrency and Computation: Practice and Experience, Vol. 30, Issue 17
DOI: 10.1002/cpe.4667

Accelerating AIREBO: Navigating the Journey from Legacy to High‐Performance Code
journal, February 2019

Höhnerbach, Markus; Bientinesi, Paolo
Journal of Computational Chemistry, Vol. 40, Issue 14
DOI: 10.1002/jcc.25796

Scaling molecular dynamics beyond 100,000 processor cores for large‐scale biophysical simulations
journal, April 2019

Jung, Jaewoon; Nishima, Wataru; Daniels, Marcus
Journal of Computational Chemistry, Vol. 40, Issue 21
DOI: 10.1002/jcc.25840

An Efficient Optimization of Hll Method for the Second Generation of Intel Xeon Phi Processor
journal, May 2018

Kulikov, I. M.; Chernykh, I. G.; Glinskiy, B. M.
Lobachevskii Journal of Mathematics, Vol. 39, Issue 4
DOI: 10.1134/s1995080218040091

Numerical Modeling of Hydrodynamic Turbulence with Self-gravity on Intel Xeon Phi KNL
book, August 2019

Kulikov, Igor; Chernykh, Igor; Berendeev, Evgeny
Parallel Computational Technologies: 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2–4, 2019, Revised Selected Papers, p. 309-322
DOI: 10.1007/978-3-030-28163-2_22

Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation
book, January 2019

Sedova, Ada; Tillack, Andreas F.; Tharrington, Arnold
Accelerator Programming Using Directives
DOI: 10.1007/978-3-030-12274-4_2

Similar Records in DOE PAGES and OSTI.GOV collections:

Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library

Journal Article Bleile, Ryan C. ; Department of Computer and Information Science, University of Oregon, Eugene, OR 97403 ; Brantley, Patrick S. ; ... - Transactions of the American Nuclear Society

Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. Los Alamos National Laboratory's Trinity machine, available in 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. Lawrence Livermore National Laboratory's Sierra machine, available in 2018, will use an IBM PowerPC architecture along with Nvidia graphics processing unit (GPU) architecture accelerators. These different advanced architectures make the computing landscape in upcoming years complex. Traditional approaches to Monte Carlo transport do not work efficiently on these new computing platforms. MIC architectures require vectorization to operate efficiently,more »« less
Storage-Intensive Supercomputing Benchmark Study

Technical Report Cohen, J ; Dossa, D ; Gokhale, M ; ...

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more »« less
https://doi.org/10.2172/924182

Full Text Available
Towards Achieving Performance Portability Using Directives for Accelerators

Conference Lopez, M. Graham ; Larrea, Veronica Vergara ; Joubert, Wayne ; ...

In this paper we explore the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, how much tuning might be required and what lessons we can learn from this experience. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. We implementmore »« less
https://doi.org/10.1109/WACCPD.2016.006
Development of CSG-based radiation shielding module for ARCHER: preliminary results for photons - Paper 126

Conference Du, Xining ; Liu, Tianyu ; Su, Lin ; ...

ARCHER is a parallel Monte Carlo radiation transport code being developed for various hardware platforms including multi-core CPUs, the emerging Nvidia graphics processing units (GPUs) and Intel Xeon Phi coprocessors. Previous studies have demonstrated the accuracy and speed for CT imaging and radiotherapy dosimetry applications. Radiation shielding applications, however, require ARCHER to be more flexible in defining and handling geometries, in addition to the voxel phantom geometries that are commonly used in medical physics calculations. In this paper, we present the development of a general geometry module used for radiation protection and shielding problems in ARCHER. The methods of geometrymore »« less
Quantum Monte Carlo Endstation for Petascale Computing

Technical Report Ceperley, David

The major achievements enabled by QMC Endstation grant include * Performance improvement on clusters of x86 multi-core systems, especially on Cray XT systems * New and improved methods for the wavefunction optimizations * New forms of trial wavefunctions * Implementation of the full application on NVIDIA GPUs using CUDA The scaling studies of QMCPACK on large-scale systems show excellent parallel efficiency up to 216K cores on Jaguarpf (Cray XT5). The GPU implementation shows speedups of 10-15x over the CPU implementation on older generation of x86. We have implemented hybrid OpenMP/MPI scheme in QMC to take advantage of multi-core shared memorymore »« less
https://doi.org/10.2172/1007216

Full Text Available

Similar Records

Title: Optimizing legacy molecular dynamics software with directive-based offload

Abstract

Citation Formats

Implementing molecular dynamics on hybrid high performance computers – short range forces journal, April 2011

Implementing molecular dynamics on hybrid high performance computers – Particle–particle particle-mesh journal, March 2012

An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer journal, January 2012

Implementing molecular dynamics on hybrid high performance computers—Three-body potentials journal, December 2013

Fast Parallel Algorithms for Short-Range Molecular Dynamics journal, March 1995

A flexible algorithm for calculating pair interactions on SIMD architectures journal, December 2013

A generalized Gay-Berne intermolecular potential for biaxial particles journal, April 1995

All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins † journal, April 1998

Quiet high-resolution computer models of a plasma journal, February 1974

Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes journal, March 1977

Molecular Dynamics Simulation of Dark-adapted Rhodopsin in an Explicit Membrane Bilayer: Coupling between Local Retinal and Larger Scale Conformational Change journal, October 2003

On the morphology of polymer-based photovoltaics journal, March 2012

Development and testing of a general amber force field journal, January 2004

Modelling of P3HT:PCBM interface using coarse-grained forcefield derived from accurate atomistic forcefield journal, January 2014

Liquid crystal nanodroplets in solution journal, January 2009

Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations journal, January 2014

Computational aspects of many-body potentials journal, May 2012

Design considerations for GPU-aware collective communications in MPI: Design considerations for GPU-Aware collective communications in MPI journal, May 2018

Accelerating AIREBO: Navigating the Journey from Legacy to High‐Performance Code journal, February 2019

Scaling molecular dynamics beyond 100,000 processor cores for large‐scale biophysical simulations journal, April 2019

An Efficient Optimization of Hll Method for the Second Generation of Intel Xeon Phi Processor journal, May 2018

Numerical Modeling of Hydrodynamic Turbulence with Self-gravity on Intel Xeon Phi KNL book, August 2019

Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation book, January 2019

Implementing molecular dynamics on hybrid high performance computers – short range forces
journal, April 2011

Implementing molecular dynamics on hybrid high performance computers – Particle–particle particle-mesh
journal, March 2012

An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer
journal, January 2012

Implementing molecular dynamics on hybrid high performance computers—Three-body potentials
journal, December 2013

Fast Parallel Algorithms for Short-Range Molecular Dynamics
journal, March 1995

A flexible algorithm for calculating pair interactions on SIMD architectures
journal, December 2013

A generalized Gay-Berne intermolecular potential for biaxial particles
journal, April 1995

All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins ^†
journal, April 1998

Quiet high-resolution computer models of a plasma
journal, February 1974

Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes
journal, March 1977

Molecular Dynamics Simulation of Dark-adapted Rhodopsin in an Explicit Membrane Bilayer: Coupling between Local Retinal and Larger Scale Conformational Change
journal, October 2003

On the morphology of polymer-based photovoltaics
journal, March 2012

Development and testing of a general amber force field
journal, January 2004

Modelling of P3HT:PCBM interface using coarse-grained forcefield derived from accurate atomistic forcefield
journal, January 2014

Liquid crystal nanodroplets in solution
journal, January 2009

Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations
journal, January 2014

Computational aspects of many-body potentials
journal, May 2012

Design considerations for GPU-aware collective communications in MPI: Design considerations for GPU-Aware collective communications in MPI
journal, May 2018

Accelerating AIREBO: Navigating the Journey from Legacy to High‐Performance Code
journal, February 2019

Scaling molecular dynamics beyond 100,000 processor cores for large‐scale biophysical simulations
journal, April 2019

An Efficient Optimization of Hll Method for the Second Generation of Intel Xeon Phi Processor
journal, May 2018

Numerical Modeling of Hydrodynamic Turbulence with Self-gravity on Intel Xeon Phi KNL
book, August 2019

Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation
book, January 2019