DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An automatic performance model-based scheduling tool for coupled climate system models

Abstract

The prediction of the climate system is highly depended on the efficient integration of observations and simulations of the Earth. This is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper firstly constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficientmore » scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144 to 480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements could also be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model.« less

Authors:
 [1];  [2];  [3];  [4];  [5];  [6]
  1. Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology; Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling
  2. Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology; Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling; National Supercomputing Center, Wuxi (China); Joint Center for Global Change Studies, Beijing (China)
  3. Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; ; State Oceanic Administration, Qingdao (China). First Inst. of Oceanography
  4. Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling; National Supercomputing Center, Wuxi (China); Joint Center for Global Change Studies, Beijing (China)
  5. Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling
  6. Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1559255
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Parallel and Distributed Computing
Additional Journal Information:
Journal Volume: 132; Journal Issue: C; Journal ID: ISSN 0743-7315
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, and Zheng, Weimin. An automatic performance model-based scheduling tool for coupled climate system models. United States: N. p., 2018. Web. doi:10.1016/j.jpdc.2018.01.002.
Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, & Zheng, Weimin. An automatic performance model-based scheduling tool for coupled climate system models. United States. https://doi.org/10.1016/j.jpdc.2018.01.002
Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, and Zheng, Weimin. Wed . "An automatic performance model-based scheduling tool for coupled climate system models". United States. https://doi.org/10.1016/j.jpdc.2018.01.002. https://www.osti.gov/servlets/purl/1559255.
@article{osti_1559255,
title = {An automatic performance model-based scheduling tool for coupled climate system models},
author = {Ding, Nan and Xue, Wei and Song, Zhenya and Fu, Haohuan and Xu, Shiming and Zheng, Weimin},
abstractNote = {The prediction of the climate system is highly depended on the efficient integration of observations and simulations of the Earth. This is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper firstly constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficient scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144 to 480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements could also be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model.},
doi = {10.1016/j.jpdc.2018.01.002},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 132,
place = {United States},
year = {Wed Jan 31 00:00:00 EST 2018},
month = {Wed Jan 31 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: An example of inter-communications among components. The arrows indicate the inter-communication pattern in CESM. For example, the ATM and OCN exchange interfacial flux and state data via CPL; the CPL has grid information for both ATM and OCN and carries out intergrid interpolation of state and flux data,more » and then sends the new data back to the ATM and OCN.« less

Save / Share: