skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors

Abstract

We present the acceleration of an IMplicit-EXplicit (IMEX) nonhydrostatic atmospheric model on manycore processors such as graphic processing units (GPUs) and Intel's Many Integrated Core (MIC) architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C = 15, we obtained a speedup of about 4x relative to an explicit time stepping method run with the maximum allowable C = 1. Moreover, the unconditional stability of IMEX with respect to the fast waves means the speedup can increase significantly with the Courant number as long as the accuracy of the resulting solution is acceptable. We show a speedup of 100x at C = 150 using 1D-IMEX to demonstrate this point. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: (a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement, (b) formulating a horizontally explicit vertically implicit 1D-IMEX scheme that has a lower workload and better scalability than 3D-IMEX,more » (c) using high-order polynomial preconditioners to reduce the condition number of the resulting system, and (d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Several parametric studies are conducted to demonstrate the gain from each of the abovementioned improvements. Lastly, we validate our results with standard benchmark problems in numerical weather prediction and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors.« less

Authors:
 [1];  [1];  [2];  [1];  [1];  [3]
  1. Naval Postgraduate School, Monterey, CA (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); US Department of the Navy, Office of Naval Research (ONR)
OSTI Identifier:
1498290
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
International Journal of High Performance Computing Applications
Additional Journal Information:
Journal Volume: 33; Journal Issue: 2; Journal ID: ISSN 1094-3420
Publisher:
SAGE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; IMEX; NUMA; GPU; KNL; manycore; HPC; OCCA; atmospheric model; discontinuous Galerkin; continuous Galerkin

Citation Formats

Abdi, Daniel S., Giraldo, Francis X., Constantinescu, Emil M., Carr, III, Lester E., Wilcox, Lucas C., and Warburton, Timothy C. Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors. United States: N. p., 2017. Web. doi:10.1177/1094342017732395.
Abdi, Daniel S., Giraldo, Francis X., Constantinescu, Emil M., Carr, III, Lester E., Wilcox, Lucas C., & Warburton, Timothy C. Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors. United States. doi:10.1177/1094342017732395.
Abdi, Daniel S., Giraldo, Francis X., Constantinescu, Emil M., Carr, III, Lester E., Wilcox, Lucas C., and Warburton, Timothy C. Tue . "Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors". United States. doi:10.1177/1094342017732395. https://www.osti.gov/servlets/purl/1498290.
@article{osti_1498290,
title = {Acceleration of the IMplicit–EXplicit nonhydrostatic unified model of the atmosphere on manycore processors},
author = {Abdi, Daniel S. and Giraldo, Francis X. and Constantinescu, Emil M. and Carr, III, Lester E. and Wilcox, Lucas C. and Warburton, Timothy C.},
abstractNote = {We present the acceleration of an IMplicit-EXplicit (IMEX) nonhydrostatic atmospheric model on manycore processors such as graphic processing units (GPUs) and Intel's Many Integrated Core (MIC) architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C = 15, we obtained a speedup of about 4x relative to an explicit time stepping method run with the maximum allowable C = 1. Moreover, the unconditional stability of IMEX with respect to the fast waves means the speedup can increase significantly with the Courant number as long as the accuracy of the resulting solution is acceptable. We show a speedup of 100x at C = 150 using 1D-IMEX to demonstrate this point. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: (a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement, (b) formulating a horizontally explicit vertically implicit 1D-IMEX scheme that has a lower workload and better scalability than 3D-IMEX, (c) using high-order polynomial preconditioners to reduce the condition number of the resulting system, and (d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Several parametric studies are conducted to demonstrate the gain from each of the abovementioned improvements. Lastly, we validate our results with standard benchmark problems in numerical weather prediction and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors.},
doi = {10.1177/1094342017732395},
journal = {International Journal of High Performance Computing Applications},
issn = {1094-3420},
number = 2,
volume = 33,
place = {United States},
year = {2017},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: