Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures.

Agarwal, Pratul K.; Hampton, Scott; Poznanovic, Jeffrey; Ramanthan, Arvind; Alam, Sadaf R.; Crozier, Paul S.

doi:10.1002/cpe.2943

Title: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures.

Journal Article · Tue Oct 23 00:00:00 EDT 2012 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.2943· OSTI ID:1564926

Agarwal, Pratul K. ^[1]; Hampton, Scott; Poznanovic, Jeffrey ^[2]; Ramanthan, Arvind ^[1]; Alam, Sadaf R. ^[2]; Crozier, Paul S. ^[3]

Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Swiss National Supercomputing Center, Manno, Switzerland
Sandia National Laboratories, Albuquerque, New Mexico, USA

Performance improvements in biomolecular simulations based on molecular dynamics (MD) codes are widely desired. Unfortunately, the factors, which allowed past performance improvements, particularly the microprocessor clock frequencies, are no longer increasing. Hence, novel software and hardware solutions are being explored for accelerating performance of widely used MD codes. In this paper, we describe our efforts on porting, optimizing and tuning of Large-scale Atomic/Molecular Massively Parallel Simulator, a popular MD framework, on heterogeneous architectures: multi-core processors with graphical processing unit (GPU) accelerators. Our implementation is based on accelerating the most computationally expensive non-bonded interaction terms on the GPUs and overlapping the computation on the CPU and GPUs. This functionality is built on top of message passing interface that allows multi-level parallelism to be extracted even at the workstation level with the multi-core CPUs and allows extension of the implementation on GPU-enabled clusters. We hypothesize that the optimal benefit of heterogeneous architectures for applications will come by utilizing all possible resources (for example, CPU-cores and GPU devices on GPU-enabled clusters). Benchmarks for a range of biomolecular system sizes are provided, and an analysis is performed on four generations of NVIDIA's GPU devices. On GPU-enabled Linux clusters, by overlapping and pipelining computation and communication, we observe up to 10-folds application acceleration in multi-core and multi-GPU environments illustrating significant performance improvements. Detailed analysis of the implementation is presented that allows identification of bottlenecks in algorithm, indicating that code optimization and improvements on GPUs could allow microsecond scale simulation throughput on workstations and inexpensive GPU clusters, putting widely desired biologically relevant simulation time-scales within reach of a large user community. In order to systematically optimize simulation throughput and to enable performance prediction, we have developed a parameterized performance model that will allow developers and users to explore the performance potential of future heterogeneous systems for biological simulations. Copyright © 2012 John Wiley & Sons, Ltd.

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Lockheed Martin Corporation, Littleton, CO (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 1564926

Journal Information:: Concurrency and Computation. Practice and Experience, Vol. 25, Issue 10; ISSN 1532-0626

Publisher:: Wiley

Country of Publication:: United States

Language:: English

References (18)

General purpose molecular dynamics simulations fully implemented on graphics processing units Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A. Journal of Computational Physics, Vol. 227, Issue 10 https://doi.org/10.1016/j.jcp.2008.01.047	journal	May 2008
Accelerating molecular dynamic simulation on graphics processing units Friedrichs, Mark S.; Eastman, Peter; Vaidyanathan, Vishal Journal of Computational Chemistry, Vol. 30, Issue 6 https://doi.org/10.1002/jcc.21209	journal	April 2009
Multilevel summation of electrostatic potentials using graphics processing units Hardy, David J.; Stone, John E.; Schulten, Klaus Parallel Computing, Vol. 35, Issue 3 https://doi.org/10.1016/j.parco.2008.12.005	journal	March 2009
A Practical Quicksort Algorithm for Graphics Processors Cederman, Daniel; Tsigas, Philippas Algorithms - ESA 2008 https://doi.org/10.1007/978-3-540-87744-8_21	book	January 2008
An adaptive performance modeling tool for GPU architectures Baghsorkhi, Sara S.; Delahaye, Matthieu; Patel, Sanjay J. Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693470	conference	January 2010
GPU Computing Owens, J. D.; Houston, M.; Luebke, D. Proceedings of the IEEE, Vol. 96, Issue 5 https://doi.org/10.1109/JPROC.2008.917757	journal	May 2008
An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware Harvey, M. J.; De Fabritiis, G. Journal of Chemical Theory and Computation, Vol. 5, Issue 9 https://doi.org/10.1021/ct900275y	journal	August 2009
Particle mesh Ewald: An N ⋅log( N ) method for Ewald sums in large systems Darden, Tom; York, Darrin; Pedersen, Lee The Journal of Chemical Physics, Vol. 98, Issue 12 https://doi.org/10.1063/1.464397	journal	June 1993
Breaking the petaflops barrier Grice, D.; Brandt, H.; Wright, C. IBM Journal of Research and Development, Vol. 53, Issue 5 https://doi.org/10.1147/JRD.2009.5429067	journal	September 2009
Biomolecular simulations on petascale: promises and challenges Agarwal, Pratul K.; Alam, Sadaf R. Journal of Physics: Conference Series, Vol. 46 https://doi.org/10.1088/1742-6596/46/1/046	journal	September 2006
Overview of the IBM Blue Gene/P project Almasi, Gheorghi; Asaad, Sameh W.; Bellofatto, Ralph IBM Journal of Research and Development, Vol. 52, Issue 1.2, p. 199-220 https://doi.org/10.1147/rd.521.0199	journal	January 2008
Practical performance portability in the Parallel Ocean Program (POP) Jones, P. W.; Worley, P. H.; Yoshida, Y. Concurrency and Computation: Practice and Experience, Vol. 17, Issue 10 https://doi.org/10.1002/cpe.894	journal	January 2005
Using FPGA Devices to Accelerate Biomolecular Simulations Alam, Sadaf R.; Agarwal, Pratul K.; Smith, Melissa C. Computer, Vol. 40, Issue 3 https://doi.org/10.1109/MC.2007.108	journal	March 2007
Fast Parallel Algorithms for Short-Range Molecular Dynamics Plimpton, Steve Journal of Computational Physics, Vol. 117, Issue 1 https://doi.org/10.1006/jcph.1995.1039	journal	March 1995
Accelerating molecular modeling applications with graphics processors Stone, John E.; Phillips, James C.; Freddolino, Peter L. Journal of Computational Chemistry, Vol. 28, Issue 16 https://doi.org/10.1002/jcc.20829	journal	January 2007
Computing Models for FPGA-Based Accelerators Herbordt, Martin C.; Gu, Yongfeng; VanCourt, Tom Computing in Science & Engineering, Vol. 10, Issue 6 https://doi.org/10.1109/MCSE.2008.143	journal	November 2008
Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence Yokota, R.; Narumi, T.; Sakamaki, R. Computer Physics Communications, Vol. 180, Issue 11 https://doi.org/10.1016/j.cpc.2009.06.009	journal	November 2009
Fast Conjugate Gradients with Multiple GPUs Cevahir, Ali; Nukada, Akira; Matsuoka, Satoshi Lecture Notes in Computer Science https://doi.org/10.1007/978-3-642-01970-8_90	book	January 2009

Similar Records

Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures

Journal Article · Mon Jul 01 00:00:00 EDT 2013 · Concurrency and Computation: Practice and Experience · OSTI ID:1564926

Agarwal, Pratul K.; Ramanathan, Arvind

Towards Microsecond Biological Molecular Dynamics Simulations on Hybrid Processors

Conference · Fri Jan 01 00:00:00 EST 2010 · OSTI ID:1564926

Hampton, Scott S; Agarwal, Pratul K

Developing Mango Graph Studio and its Applications for Bioinformatics and Systems Biology (SBIR Phase I Grant Final Technical Report)

Technical Report · Tue Feb 12 00:00:00 EST 2019 · OSTI ID:1564926

Chou, Hui-Hsien; Vanous, Kimberly

Related Subjects

Computer Science

Title: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures.

Citation Formats

References (18)

Similar Records

Related Subjects