skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Analysis of Large Scale HPC Workflows for Earth System Models

Abstract

Climate assessment reports, such as the Intergovernmental Panel on Climate Change’s 5th assesment report [IPCC AR5; IPCC et al., 2014] or the United States National Climate Assessment [U.S. Global Change Research Program, 2014], increasingly rely on many-ensemble climate projections from fully-coupled Earth system models (ESM) to asses possible future climate states. ESM participate in a large set of Coupled Model Intercomparson Projects (CMIP) to better understand their models and the climate system, so that they may provide credible climate projections. However, fully participating in these intercomparison exercises represents an extremely large computational personnel burden for modeling centers [Eyring et al., 2016, Williams et al., 2016a]. However, there has been little attention within the climate modeling community paid to the in situ performance of ESMs up until now. Recently a computational performance model intercomparison project (CPMIP) for CMIP6 has begun [Balaji et al., 2017], but there has not yet been an effort to gather, process, and analyze the historical performance data that has been saved within archived log files from previous modeling efforts, nor the development of performance analysis system capable of providing detailed statistical analyses or visualizations of this performance data. Therefore, we’ve begun a joint effort to analyze themore » historical and current performance data on both GFDL and OLCF modeling systems. Here, we have developed a preliminary performance analysis system to captue, analyize, and visualize end-to-end in situ performance data and applied it to NOAA/GFDL’s post-processing system. Through the course of the project a number of performance improvements have been made, with the steps necessary to handle the expected volume of data (Big Data) likely to come out of the CMIP6 exercises (TB to PB) provided. We have also begun developing a statistical workload model that will allow modeling centers to determined their expected scientific throughput in the context of expected resource (computational and personnel) availability and accounting for typical workflow disruptions. This will allow them to develop a CMIP participation strategy and maximize their level of participation in the exercises. We expect this infrastructure, and the data it produces, to ultimately allow us to identify performance issues and workflow errors that exist across Earth system modeling centers, and possibly HPC modeling centers in general. Center-independent solutions to workflow or performance issues would help alleviate the computational burden for CMIP6+ modeling projects and should realize both labor and cost savings benefits throughout the CMIP community.« less

Authors:
 [1];  [1];  [1];  [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. National Oceanic and Atmospheric Administration (NOAA), Boulder, CO (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Oceanic and Atmospheric Administration (NOAA), Boulder, CO (United States)
OSTI Identifier:
1439154
Report Number(s):
ORNL/TM-2017/540
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 54 ENVIRONMENTAL SCIENCES

Citation Formats

Kennedy, Joseph H., Mayer, Benjamin W., Evans, Katherine J., and Duracha, Jeff. Performance Analysis of Large Scale HPC Workflows for Earth System Models. United States: N. p., 2017. Web. doi:10.2172/1439154.
Kennedy, Joseph H., Mayer, Benjamin W., Evans, Katherine J., & Duracha, Jeff. Performance Analysis of Large Scale HPC Workflows for Earth System Models. United States. doi:10.2172/1439154.
Kennedy, Joseph H., Mayer, Benjamin W., Evans, Katherine J., and Duracha, Jeff. Wed . "Performance Analysis of Large Scale HPC Workflows for Earth System Models". United States. doi:10.2172/1439154. https://www.osti.gov/servlets/purl/1439154.
@article{osti_1439154,
title = {Performance Analysis of Large Scale HPC Workflows for Earth System Models},
author = {Kennedy, Joseph H. and Mayer, Benjamin W. and Evans, Katherine J. and Duracha, Jeff},
abstractNote = {Climate assessment reports, such as the Intergovernmental Panel on Climate Change’s 5th assesment report [IPCC AR5; IPCC et al., 2014] or the United States National Climate Assessment [U.S. Global Change Research Program, 2014], increasingly rely on many-ensemble climate projections from fully-coupled Earth system models (ESM) to asses possible future climate states. ESM participate in a large set of Coupled Model Intercomparson Projects (CMIP) to better understand their models and the climate system, so that they may provide credible climate projections. However, fully participating in these intercomparison exercises represents an extremely large computational personnel burden for modeling centers [Eyring et al., 2016, Williams et al., 2016a]. However, there has been little attention within the climate modeling community paid to the in situ performance of ESMs up until now. Recently a computational performance model intercomparison project (CPMIP) for CMIP6 has begun [Balaji et al., 2017], but there has not yet been an effort to gather, process, and analyze the historical performance data that has been saved within archived log files from previous modeling efforts, nor the development of performance analysis system capable of providing detailed statistical analyses or visualizations of this performance data. Therefore, we’ve begun a joint effort to analyze the historical and current performance data on both GFDL and OLCF modeling systems. Here, we have developed a preliminary performance analysis system to captue, analyize, and visualize end-to-end in situ performance data and applied it to NOAA/GFDL’s post-processing system. Through the course of the project a number of performance improvements have been made, with the steps necessary to handle the expected volume of data (Big Data) likely to come out of the CMIP6 exercises (TB to PB) provided. We have also begun developing a statistical workload model that will allow modeling centers to determined their expected scientific throughput in the context of expected resource (computational and personnel) availability and accounting for typical workflow disruptions. This will allow them to develop a CMIP participation strategy and maximize their level of participation in the exercises. We expect this infrastructure, and the data it produces, to ultimately allow us to identify performance issues and workflow errors that exist across Earth system modeling centers, and possibly HPC modeling centers in general. Center-independent solutions to workflow or performance issues would help alleviate the computational burden for CMIP6+ modeling projects and should realize both labor and cost savings benefits throughout the CMIP community.},
doi = {10.2172/1439154},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {11}
}

Technical Report:

Save / Share: