skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures.

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.2943· OSTI ID:1564926
 [1]; ;  [2];  [1];  [2];  [3]
  1. Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
  2. Swiss National Supercomputing Center, Manno, Switzerland
  3. Sandia National Laboratories, Albuquerque, New Mexico, USA

Performance improvements in biomolecular simulations based on molecular dynamics (MD) codes are widely desired. Unfortunately, the factors, which allowed past performance improvements, particularly the microprocessor clock frequencies, are no longer increasing. Hence, novel software and hardware solutions are being explored for accelerating performance of widely used MD codes. In this paper, we describe our efforts on porting, optimizing and tuning of Large-scale Atomic/Molecular Massively Parallel Simulator, a popular MD framework, on heterogeneous architectures: multi-core processors with graphical processing unit (GPU) accelerators. Our implementation is based on accelerating the most computationally expensive non-bonded interaction terms on the GPUs and overlapping the computation on the CPU and GPUs. This functionality is built on top of message passing interface that allows multi-level parallelism to be extracted even at the workstation level with the multi-core CPUs and allows extension of the implementation on GPU-enabled clusters. We hypothesize that the optimal benefit of heterogeneous architectures for applications will come by utilizing all possible resources (for example, CPU-cores and GPU devices on GPU-enabled clusters). Benchmarks for a range of biomolecular system sizes are provided, and an analysis is performed on four generations of NVIDIA's GPU devices. On GPU-enabled Linux clusters, by overlapping and pipelining computation and communication, we observe up to 10-folds application acceleration in multi-core and multi-GPU environments illustrating significant performance improvements. Detailed analysis of the implementation is presented that allows identification of bottlenecks in algorithm, indicating that code optimization and improvements on GPUs could allow microsecond scale simulation throughput on workstations and inexpensive GPU clusters, putting widely desired biologically relevant simulation time-scales within reach of a large user community. In order to systematically optimize simulation throughput and to enable performance prediction, we have developed a parameterized performance model that will allow developers and users to explore the performance potential of future heterogeneous systems for biological simulations. Copyright © 2012 John Wiley & Sons, Ltd.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Lockheed Martin Corporation, Littleton, CO (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1564926
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 25, Issue 10; ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English

References (18)

General purpose molecular dynamics simulations fully implemented on graphics processing units journal May 2008
Accelerating molecular dynamic simulation on graphics processing units journal April 2009
Multilevel summation of electrostatic potentials using graphics processing units journal March 2009
A Practical Quicksort Algorithm for Graphics Processors book January 2008
An adaptive performance modeling tool for GPU architectures
  • Baghsorkhi, Sara S.; Delahaye, Matthieu; Patel, Sanjay J.
  • Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10 https://doi.org/10.1145/1693453.1693470
conference January 2010
GPU Computing journal May 2008
An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware journal August 2009
Particle mesh Ewald: An N ⋅log( N ) method for Ewald sums in large systems journal June 1993
Breaking the petaflops barrier journal September 2009
Biomolecular simulations on petascale: promises and challenges journal September 2006
Overview of the IBM Blue Gene/P project journal January 2008
Practical performance portability in the Parallel Ocean Program (POP) journal January 2005
Using FPGA Devices to Accelerate Biomolecular Simulations journal March 2007
Fast Parallel Algorithms for Short-Range Molecular Dynamics journal March 1995
Accelerating molecular modeling applications with graphics processors journal January 2007
Computing Models for FPGA-Based Accelerators journal November 2008
Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence journal November 2009
Fast Conjugate Gradients with Multiple GPUs book January 2009