skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters

Abstract

Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with trillions ( O ( 10 12 ) ) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic PDEs which are encountered in (semi-) implicit time stepping procedures in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. In particular, an important constant in the discretisation is the CFL number; only the multigrid solver is robust to changes in this constant. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with 0.55 · 10 12unknowns on 16384 GPUs; this corresponds to about 3% of the theoretical peak performance of the machine and we use more than 40% of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second.

Authors:
; ; ORCiD logo
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); UT-Battelle LLC/ORNL, Oak Ridge, TN (Unted States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1565438
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Journal Article
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 50; Journal Issue: C; Journal ID: ISSN 0167-8191
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Müller, Eike Hermann, Scheichl, Robert, and Vainikko, Eero. Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters. United States: N. p., 2015. Web. doi:10.1016/j.parco.2015.10.007.
Müller, Eike Hermann, Scheichl, Robert, & Vainikko, Eero. Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters. United States. https://doi.org/10.1016/j.parco.2015.10.007
Müller, Eike Hermann, Scheichl, Robert, and Vainikko, Eero. Tue . "Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters". United States. https://doi.org/10.1016/j.parco.2015.10.007.
@article{osti_1565438,
title = {Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters},
author = {Müller, Eike Hermann and Scheichl, Robert and Vainikko, Eero},
abstractNote = {Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with trillions (O(1012)) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic PDEs which are encountered in (semi-) implicit time stepping procedures in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. In particular, an important constant in the discretisation is the CFL number; only the multigrid solver is robust to changes in this constant. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with 0.55 · 1012unknowns on 16384 GPUs; this corresponds to about 3% of the theoretical peak performance of the machine and we use more than 40% of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second.},
doi = {10.1016/j.parco.2015.10.007},
url = {https://www.osti.gov/biblio/1565438}, journal = {Parallel Computing},
issn = {0167-8191},
number = C,
volume = 50,
place = {United States},
year = {2015},
month = {12}
}