An automatic performance model-based scheduling tool for coupled climate system models
Abstract
The prediction of the climate system is highly depended on the efficient integration of observations and simulations of the Earth. This is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper firstly constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficientmore »
- Authors:
-
- Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology; Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling
- Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology; Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling; National Supercomputing Center, Wuxi (China); Joint Center for Global Change Studies, Beijing (China)
- Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; ; State Oceanic Administration, Qingdao (China). First Inst. of Oceanography
- Qingdao National Lab. for Marine Science and Technology (United States). Lab. for Regional Oceanography and Numerical Modeling; Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling; National Supercomputing Center, Wuxi (China); Joint Center for Global Change Studies, Beijing (China)
- Tsinghua Univ., Beijing (China). Ministry of Education Key Lab. for Earth System Modeling
- Tsinghua Univ., Beijing (China). Dept. of Computer Science and Technology
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1559255
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Parallel and Distributed Computing
- Additional Journal Information:
- Journal Volume: 132; Journal Issue: C; Journal ID: ISSN 0743-7315
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 58 GEOSCIENCES; 97 MATHEMATICS AND COMPUTING
Citation Formats
Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, and Zheng, Weimin. An automatic performance model-based scheduling tool for coupled climate system models. United States: N. p., 2018.
Web. doi:10.1016/j.jpdc.2018.01.002.
Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, & Zheng, Weimin. An automatic performance model-based scheduling tool for coupled climate system models. United States. https://doi.org/10.1016/j.jpdc.2018.01.002
Ding, Nan, Xue, Wei, Song, Zhenya, Fu, Haohuan, Xu, Shiming, and Zheng, Weimin. Wed .
"An automatic performance model-based scheduling tool for coupled climate system models". United States. https://doi.org/10.1016/j.jpdc.2018.01.002. https://www.osti.gov/servlets/purl/1559255.
@article{osti_1559255,
title = {An automatic performance model-based scheduling tool for coupled climate system models},
author = {Ding, Nan and Xue, Wei and Song, Zhenya and Fu, Haohuan and Xu, Shiming and Zheng, Weimin},
abstractNote = {The prediction of the climate system is highly depended on the efficient integration of observations and simulations of the Earth. This is regarded as a canonical example of the cyber-physical system. The climate system model, the simulation engine in this cyber-physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper firstly constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficient scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144 to 480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements could also be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model.},
doi = {10.1016/j.jpdc.2018.01.002},
journal = {Journal of Parallel and Distributed Computing},
number = C,
volume = 132,
place = {United States},
year = {2018},
month = {1}
}
Web of Science
Figures / Tables:

Figures / Tables found in this record: