skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling

Abstract

Global chemistry–climate models are computationally burdened as the chemical mechanisms become more complex and realistic. Optimization for graphics processing units (GPU) may make longer global simulation with regional detail possible, but limited study has been done to explore the potential benefit for the atmospheric chemistry modeling. Hence, in this study, the second–order Rosenbrock solver of the chemistry module of CAM4–Chem is ported to the GPU to gauge potential speed–up. We find that on the CPU, the fastest performance is achieved using the Intel compiler with a block interleaved memory layout. Different combinations of compiler and memory layout lead to ~11.02× difference in the computational time. In contrast, the GPU version performs the best when using a combination of fully interleaved memory layout with block size equal to the warp size, CUDA streams for independent kernels, and constant memory. Moreover, the most efficient data transfer between CPU and GPU is gained by allocating the memory contiguously during the data initialization on the GPU. Compared to one CPU core, the speed–up of using one GPU alone reaches a factor of ~11.7× for the computation alone and ~3.82× when the data transfer between CPU and GPU is considered. Using one GPU alone ismore » also generally faster than the multithreaded implementation for 16 CPU cores in a compute node and the single–source solution (OpenACC). In conclusion, the best performance is achieved by the implementation of the hybrid CPU/GPU version, but rescheduling the workload among the CPU cores is required before the practical CAM4–Chem simulation.« less

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3]; ORCiD logo [3];  [4]; ORCiD logo [4];  [5]; ORCiD logo [5]
  1. Department of Civil and Environmental EngineeringUniversity of Tennessee, Knoxville Knoxville TN USA, Now at Atmospheric Science and Global Change DivisionPacific Northwest National Laboratory Richland Washington USA
  2. Department of Civil and Environmental EngineeringUniversity of Tennessee, Knoxville Knoxville TN USA, Climate Change Science Institute and Computational Sciences and Engineering DivisionOak Ridge National Laboratory Oak Ridge TN USA
  3. Department of Civil and Environmental EngineeringUniversity of Tennessee, Knoxville Knoxville TN USA
  4. Innovative Computing LaboratoryUniversity of Tennessee, Knoxville Knoxville TN USA
  5. Climate Change Science Institute and Computational Sciences and Engineering DivisionOak Ridge National Laboratory Oak Ridge TN USA
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1464561
Alternate Identifier(s):
OSTI ID: 1464562; OSTI ID: 1468957
Grant/Contract Number:  
DE‐AC05‐00OR22725; AC05-76RL01830; Z12-93537; OAC 1740250; AC05-00OR22725
Resource Type:
Journal Article: Published Article
Journal Name:
Journal of Advances in Modeling Earth Systems
Additional Journal Information:
Journal Name: Journal of Advances in Modeling Earth Systems Journal Volume: 10 Journal Issue: 8; Journal ID: ISSN 1942-2466
Publisher:
American Geophysical Union (AGU)
Country of Publication:
United States
Language:
English
Subject:
54 ENVIRONMENTAL SCIENCES; 97 MATHEMATICS AND COMPUTING; GPU; CUDA; compiler; memory layout; data transfer; hybrid

Citation Formats

Sun, Jian, Fu, Joshua S., Drake, John B., Zhu, Qingzhao, Haidar, Azzam, Gates, Mark, Tomov, Stanimire, and Dongarra, Jack. Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling. United States: N. p., 2018. Web. doi:10.1029/2018MS001276.
Sun, Jian, Fu, Joshua S., Drake, John B., Zhu, Qingzhao, Haidar, Azzam, Gates, Mark, Tomov, Stanimire, & Dongarra, Jack. Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling. United States. https://doi.org/10.1029/2018MS001276
Sun, Jian, Fu, Joshua S., Drake, John B., Zhu, Qingzhao, Haidar, Azzam, Gates, Mark, Tomov, Stanimire, and Dongarra, Jack. Mon . "Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling". United States. https://doi.org/10.1029/2018MS001276.
@article{osti_1464561,
title = {Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling},
author = {Sun, Jian and Fu, Joshua S. and Drake, John B. and Zhu, Qingzhao and Haidar, Azzam and Gates, Mark and Tomov, Stanimire and Dongarra, Jack},
abstractNote = {Global chemistry–climate models are computationally burdened as the chemical mechanisms become more complex and realistic. Optimization for graphics processing units (GPU) may make longer global simulation with regional detail possible, but limited study has been done to explore the potential benefit for the atmospheric chemistry modeling. Hence, in this study, the second–order Rosenbrock solver of the chemistry module of CAM4–Chem is ported to the GPU to gauge potential speed–up. We find that on the CPU, the fastest performance is achieved using the Intel compiler with a block interleaved memory layout. Different combinations of compiler and memory layout lead to ~11.02× difference in the computational time. In contrast, the GPU version performs the best when using a combination of fully interleaved memory layout with block size equal to the warp size, CUDA streams for independent kernels, and constant memory. Moreover, the most efficient data transfer between CPU and GPU is gained by allocating the memory contiguously during the data initialization on the GPU. Compared to one CPU core, the speed–up of using one GPU alone reaches a factor of ~11.7× for the computation alone and ~3.82× when the data transfer between CPU and GPU is considered. Using one GPU alone is also generally faster than the multithreaded implementation for 16 CPU cores in a compute node and the single–source solution (OpenACC). In conclusion, the best performance is achieved by the implementation of the hybrid CPU/GPU version, but rescheduling the workload among the CPU cores is required before the practical CAM4–Chem simulation.},
doi = {10.1029/2018MS001276},
url = {https://www.osti.gov/biblio/1464561}, journal = {Journal of Advances in Modeling Earth Systems},
issn = {1942-2466},
number = 8,
volume = 10,
place = {United States},
year = {2018},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record at https://doi.org/10.1029/2018MS001276

Save / Share:

Works referenced in this record:

Sensitivity of ozone production rate to ozone precursors
journal, August 2001


Uncertainties and assessments of chemistry-climate models of the stratosphere
journal, January 2003


A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations
journal, May 2018


Atmospheric composition change: Climate–Chemistry interactions
journal, October 2009


Soil Nitrite as a Source of Atmospheric HONO and OH Radicals
journal, August 2011


AerChemMIP: quantifying the effects of chemistry and aerosols in CMIP6
journal, January 2017


Technical Note: Chemistry-climate model SOCOL: version 2.0 with improved transport and chemistry/microphysics schemes
journal, January 2008


Comparing Programmer Productivity in Openacc and Cuda : An Empirical Investigation
journal, October 2016


A Second-Order Rosenbrock Method Applied to Photochemical Dispersion Problems
journal, January 1999


Sensitivity of chemical tracers to meteorological parameters in the MOZART-3 chemical transport model
journal, January 2007


Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations
conference, September 2016


LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU
conference, August 2014

  • Dong, Tingxing; Haidar, Azzam; Luszczek, Piotr
  • 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS), 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)
  • https://doi.org/10.1109/HPCC.2014.30

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
conference, January 2017


A new coupled chemistry–climate model for the stratosphere: The importance of coupling for future O<SUB>3</SUB>-climate predictions
journal, January 2005


POM.gpu-v1.0: a GPU-based Princeton Ocean Model
journal, January 2015


Performance Portability in the Physical Parameterizations of the Community Atmospheric Model
journal, August 2005


A global simulation of tropospheric ozone and related tracers: Description and evaluation of MOZART, version 2: MOZART-2 DESCRIPTION AND EVALUATION
journal, December 2003


GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52)
journal, January 2017


Improving the performance scalability of the community atmosphere model
journal, July 2011


Representation of the Community Earth System Model (CESM1) CAM4-chem within the Chemistry-Climate Model Initiative (CCMI)
journal, January 2016


Autotuning batch Cholesky factorization in CUDA with interleaved layout of matrices
conference, May 2017

  • Gates, Mark; Kurzak, Jakub; Luszczek, Piotr
  • 2017 IEEE International Parallel and Distributed Processing Symposium: Workshops (IPDPSW), 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • https://doi.org/10.1109/IPDPSW.2017.18

Description and evaluation of the Model for Ozone and Related chemical Tracers, version 4 (MOZART-4)
journal, January 2010


GPU-enabled efficient executions of radiation calculations in climate modeling
conference, December 2013

  • Korwar, Sai Kiran; Vadhiyar, Sathish; Nanjundiah, Ravi S.
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • https://doi.org/10.1109/HiPC.2013.6799141

Multi-core acceleration of chemical kinetics for simulation and prediction
conference, January 2009

  • Linford, John C.; Michalakes, John; Vachharajani, Manish
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
  • https://doi.org/10.1145/1654059.1654067

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures
journal, January 2017


Gpu Acceleration of Numerical Weather Prediction
journal, December 2008


GPU-Accelerated Exploration of Biomolecular Energy Landscapes
journal, November 2016


CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application
conference, May 2013

  • Hoshino, T.; Maruyama, N.; Matsuoka, S.
  • 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
  • https://doi.org/10.1109/CCGrid.2013.12

Uncertainties and assessments of chemistry-climate models of the stratosphere
journal, January 2002