skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers

Abstract

We have developed a highly scalable 3D Finite Difference GPU code for use in earthquake engineering and disaster management through regional petascale earthquake simulations. This MPI-CUDA code is based on a widely-used wave propagation code called AWP-ODC and restructured for high throughput and efficiency on a heterogeneous computing architecture. We present an effective communication reduction technique for leveraging GPUs with minimal PCI-e overhead, and a novel overlapping method to fully hide data communication latency between GPUs. The optimization concept used in this work can be extended to general stencil computing on a structured grid. The benchmarks demonstrated sustained 100 TFlops in single precision for 49 billion mesh points using 952 GPUs on the NCCS Titan Phase 5 system, which is a 77-fold speedup compared to the CPU version of the code. This multi-GPU implementation has been validated and used for a large-scale verification wave propagation simulation of Mw5.4 Chino Hills earthquake using 128 GPUs.

Authors:
 [1];  [2];  [1];  [2];  [3]
  1. Univ. of California, San Diego, CA (United States). San Diego Supercomputer Center; Univ. of California, San Diego, CA (United States). Dept. of Electrionic and Computer Engineering
  2. Univ. of California, San Diego, CA (United States). San Diego Supercomputer Center
  3. Univ. of California, San Diego, CA (United States). Dept. of Electrionic and Computer Engineering
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567324
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Procedia Computer Science
Additional Journal Information:
Journal Volume: 18; Journal Issue: C; Journal ID: ISSN 1877-0509
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 58 GEOSCIENCES; Computer Science; Earthquake Simulations; CUDA-MPI; Enhanced Overlapping Design; Heterogeneous Supercomputers

Citation Formats

Zhou, Jun, Cui, Yifeng, Poyraz, Efecan, Choi, Dong Ju, and Guest, Clark C. Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers. United States: N. p., 2013. Web. https://doi.org/10.1016/j.procs.2013.05.292.
Zhou, Jun, Cui, Yifeng, Poyraz, Efecan, Choi, Dong Ju, & Guest, Clark C. Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers. United States. https://doi.org/10.1016/j.procs.2013.05.292
Zhou, Jun, Cui, Yifeng, Poyraz, Efecan, Choi, Dong Ju, and Guest, Clark C. Sat . "Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers". United States. https://doi.org/10.1016/j.procs.2013.05.292. https://www.osti.gov/servlets/purl/1567324.
@article{osti_1567324,
title = {Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers},
author = {Zhou, Jun and Cui, Yifeng and Poyraz, Efecan and Choi, Dong Ju and Guest, Clark C.},
abstractNote = {We have developed a highly scalable 3D Finite Difference GPU code for use in earthquake engineering and disaster management through regional petascale earthquake simulations. This MPI-CUDA code is based on a widely-used wave propagation code called AWP-ODC and restructured for high throughput and efficiency on a heterogeneous computing architecture. We present an effective communication reduction technique for leveraging GPUs with minimal PCI-e overhead, and a novel overlapping method to fully hide data communication latency between GPUs. The optimization concept used in this work can be extended to general stencil computing on a structured grid. The benchmarks demonstrated sustained 100 TFlops in single precision for 49 billion mesh points using 952 GPUs on the NCCS Titan Phase 5 system, which is a 77-fold speedup compared to the CPU version of the code. This multi-GPU implementation has been validated and used for a large-scale verification wave propagation simulation of Mw5.4 Chino Hills earthquake using 128 GPUs.},
doi = {10.1016/j.procs.2013.05.292},
journal = {Procedia Computer Science},
number = C,
volume = 18,
place = {United States},
year = {2013},
month = {6}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share: