skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset

Abstract

3D simulation of earthquake ground motion is one of the most challenging computational problems in science. The emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in terms of accelerating scientific computing research. In this paper, we describe our experiences in porting AWP-ODC, a 3D finite difference seismic wave propagation code, to the latest GPU Fermi chipset. We completely rewrote this Fortran-based 13-point asymmetric stencil computation code in C and MPI-CUDA in order to take advantage of the powerful GPU computing capabilities. Our new CUDA code implemented the asymmetric 3D stencil on Fermi to make the best use of GPU on-chip memory for an aggressive parallel efficiency. Benchmark on NVIDIA Tesla M2090 demonstrated 10x speedup versus the original fully optimized AWP-ODC FORTRAN MPI code running on a single Intel Nehalem 2.4 GHz CPU socket (4 cores/CPU), and 15x speedup versus the same MPI code running on a single AMD Istanbul 2.6 GHz CPU socket (6 cores/CPU). Sustained single-GPU performance of 143.8 GFLOPS in single precision is benchmarked for the testing case of 128x128x960 mesh size.

Authors:
 [1];  [1];  [1];  [1];  [1]
  1. Univ. of California, San Diego, CA (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567289
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Procedia Computer Science
Additional Journal Information:
Journal Volume: 9; Journal Issue: C; Journal ID: ISSN 1877-0509
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 42 ENGINEERING; Earthquake Simulation; 3D Stencil Computation; Performance Tuning; NVIDIA GPU Fermi; CUDA

Citation Formats

Zhou, Jun, Unat, Didem, Choi, Dong Ju, Guest, Clark C., and Cui, Yifeng. Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset. United States: N. p., 2012. Web. doi:10.1016/j.procs.2012.04.104.
Zhou, Jun, Unat, Didem, Choi, Dong Ju, Guest, Clark C., & Cui, Yifeng. Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset. United States. https://doi.org/10.1016/j.procs.2012.04.104
Zhou, Jun, Unat, Didem, Choi, Dong Ju, Guest, Clark C., and Cui, Yifeng. 2012. "Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset". United States. https://doi.org/10.1016/j.procs.2012.04.104. https://www.osti.gov/servlets/purl/1567289.
@article{osti_1567289,
title = {Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset},
author = {Zhou, Jun and Unat, Didem and Choi, Dong Ju and Guest, Clark C. and Cui, Yifeng},
abstractNote = {3D simulation of earthquake ground motion is one of the most challenging computational problems in science. The emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in terms of accelerating scientific computing research. In this paper, we describe our experiences in porting AWP-ODC, a 3D finite difference seismic wave propagation code, to the latest GPU Fermi chipset. We completely rewrote this Fortran-based 13-point asymmetric stencil computation code in C and MPI-CUDA in order to take advantage of the powerful GPU computing capabilities. Our new CUDA code implemented the asymmetric 3D stencil on Fermi to make the best use of GPU on-chip memory for an aggressive parallel efficiency. Benchmark on NVIDIA Tesla M2090 demonstrated 10x speedup versus the original fully optimized AWP-ODC FORTRAN MPI code running on a single Intel Nehalem 2.4 GHz CPU socket (4 cores/CPU), and 15x speedup versus the same MPI code running on a single AMD Istanbul 2.6 GHz CPU socket (6 cores/CPU). Sustained single-GPU performance of 143.8 GFLOPS in single precision is benchmarked for the testing case of 128x128x960 mesh size.},
doi = {10.1016/j.procs.2012.04.104},
url = {https://www.osti.gov/biblio/1567289}, journal = {Procedia Computer Science},
issn = {1877-0509},
number = C,
volume = 9,
place = {United States},
year = {2012},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Scalable Earthquake Simulation on Petascale Supercomputers
conference, November 2010

  • Cui, Yifeng; Olsen, Kim B.; Jordan, Thomas H.
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • https://doi.org/10.1109/SC.2010.45

Strong shaking in Los Angeles expected from southern San Andreas earthquake
journal, January 2006


Staggered-grid split-node method for spontaneous rupture simulation
journal, January 2007


3D finite difference computation on GPUs using CUDA
conference, January 2009


Fast seismic modeling and Reverse Time Migration on a GPU cluster
conference, June 2009


Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA
journal, May 2009


High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
journal, October 2010


Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs
journal, April 2010


Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition
journal, December 2010


Roofline: an insightful visual performance model for multicore architectures
journal, April 2009


Scalable parallel programming with CUDA
conference, January 2008


Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
conference, January 2011

  • Shimokawabe, Takashi; Aoki, Takayuki; Takaki, Tomohiro
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • https://doi.org/10.1145/2063384.2063388

High Performance Stencil Code Algorithms for GPGPUs
journal, January 2011


Workflow-Based High Performance Data Transfer and Ingestion to Support Petascale Simulations on TeraGrid
conference, May 2010


Works referencing / citing this record:

Geometric validation of a computer simulator used in radiography education
journal, November 2020


Towards GPU Acceleration of Phonon Computation with ShengBTE
conference, January 2020

  • Wei, Yi; You, Xin; Yang, Hailong
  • HPCAsia2020: International Conference on High Performance Computing in Asia-Pacific Region, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
  • https://doi.org/10.1145/3368474.3368487