skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication

Abstract

We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather–scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5× speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems.

Authors:
 [1];  [2];  [3];  [4];  [4];  [5];  [3]
  1. Department of Physics, Cornell University, Ithaca, NY, USA; Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
  2. KTH Royal Institute of Technology, Stockholm, Sweden
  3. Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
  4. Cray’s Suercomputing Cener of Excellence, Oak Ridge National Laboratory, Oak Ridge, TN, USA
  5. Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA; Department of Computer Science, Univerisity of Illinois at Urbana–Champaign, Champaign, IL, USA; Department of Mechanical Engineering, Univerisity of Illinois at Urbana–Champaign, Champaign, IL, USA
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1565523
DOE Contract Number:  
AC05-00OR22725; AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 30; Journal Issue: 3; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Otten, Matthew, Gong, Jing, Mametjanov, Azamat, Vose, Aaron, Levesque, John, Fischer, Paul, and Min, Misun. An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication. United States: N. p., 2016. Web. doi:10.1177/1094342015626584.
Otten, Matthew, Gong, Jing, Mametjanov, Azamat, Vose, Aaron, Levesque, John, Fischer, Paul, & Min, Misun. An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication. United States. doi:10.1177/1094342015626584.
Otten, Matthew, Gong, Jing, Mametjanov, Azamat, Vose, Aaron, Levesque, John, Fischer, Paul, and Min, Misun. Wed . "An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication". United States. doi:10.1177/1094342015626584.
@article{osti_1565523,
title = {An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication},
author = {Otten, Matthew and Gong, Jing and Mametjanov, Azamat and Vose, Aaron and Levesque, John and Fischer, Paul and Min, Misun},
abstractNote = {We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather–scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5× speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems.},
doi = {10.1177/1094342015626584},
journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 3,
volume = 30,
place = {United States},
year = {2016},
month = {7}
}

Works referenced in this record:

Nodal discontinuous Galerkin methods on graphics processors
journal, November 2009

  • Klöckner, A.; Warburton, T.; Bridge, J.
  • Journal of Computational Physics, Vol. 228, Issue 21
  • DOI: 10.1016/j.jcp.2009.06.041

OpenACC acceleration of the Nek5000 spectral element code
journal, March 2015

  • Markidis, Stefano; Gong, Jing; Schliephake, Michael
  • The International Journal of High Performance Computing Applications, Vol. 29, Issue 3
  • DOI: 10.1177/1094342015576846

An Efficient High-Order Time Integration Method for Spectral-Element Discontinuous Galerkin Simulations in Electromagnetics
journal, June 2013