DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques

Abstract

Abstract. Improving predictive understanding of Earth system variability and change requires data–model integration. Efficient data–model integration for complex models requires surrogate modeling to reduce model evaluation time. However, building a surrogate of a large-scale Earth system model (ESM) with many output variables is computationally intensive because it involves a large number of expensive ESM simulations. In this effort, we propose an efficient surrogate method capable of using a few ESM runs to build an accurate and fast-to-evaluate surrogate system of model outputs over large spatial and temporal domains. We first use singular value decomposition to reduce the output dimensions and then use Bayesian optimization techniques to generate an accurate neural network surrogate model based on limited ESM simulation samples. Our machine-learning-based surrogate methods can build and evaluate a large surrogate system of many variables quickly. Thus, whenever the quantities of interest change, such as a different objective function, a new site, and a longer simulation time, we can simply extract the information of interest from the surrogate system without rebuilding new surrogates, which significantly reduces computational efforts. We apply the proposed method to a regional ecosystem model to approximate the relationship between eight model parameters and 42 660 carbon fluxmore » outputs. Results indicate that using only 20 model simulations, we can build an accurate surrogate system of the 42 660 variables, wherein the consistency between the surrogate prediction and actual model simulation is 0.93 and the mean squared error is 0.02. This highly accurate and fast-to-evaluate surrogate system will greatly enhance the computational efficiency of data–model integration to improve predictions and advance our understanding of the Earth system.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1513382
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Geoscientific Model Development (Online)
Additional Journal Information:
Journal Name: Geoscientific Model Development (Online); Journal Volume: 12; Journal Issue: 5; Journal ID: ISSN 1991-9603
Publisher:
European Geosciences Union
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES

Citation Formats

Lu, Dan, and Ricciuto, Daniel M. Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques. United States: N. p., 2019. Web. doi:10.5194/gmd-12-1791-2019.
Lu, Dan, & Ricciuto, Daniel M. Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques. United States. https://doi.org/10.5194/gmd-12-1791-2019
Lu, Dan, and Ricciuto, Daniel M. Mon . "Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques". United States. https://doi.org/10.5194/gmd-12-1791-2019. https://www.osti.gov/servlets/purl/1513382.
@article{osti_1513382,
title = {Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques},
author = {Lu, Dan and Ricciuto, Daniel M.},
abstractNote = {Abstract. Improving predictive understanding of Earth system variability and change requires data–model integration. Efficient data–model integration for complex models requires surrogate modeling to reduce model evaluation time. However, building a surrogate of a large-scale Earth system model (ESM) with many output variables is computationally intensive because it involves a large number of expensive ESM simulations. In this effort, we propose an efficient surrogate method capable of using a few ESM runs to build an accurate and fast-to-evaluate surrogate system of model outputs over large spatial and temporal domains. We first use singular value decomposition to reduce the output dimensions and then use Bayesian optimization techniques to generate an accurate neural network surrogate model based on limited ESM simulation samples. Our machine-learning-based surrogate methods can build and evaluate a large surrogate system of many variables quickly. Thus, whenever the quantities of interest change, such as a different objective function, a new site, and a longer simulation time, we can simply extract the information of interest from the surrogate system without rebuilding new surrogates, which significantly reduces computational efforts. We apply the proposed method to a regional ecosystem model to approximate the relationship between eight model parameters and 42 660 carbon flux outputs. Results indicate that using only 20 model simulations, we can build an accurate surrogate system of the 42 660 variables, wherein the consistency between the surrogate prediction and actual model simulation is 0.93 and the mean squared error is 0.02. This highly accurate and fast-to-evaluate surrogate system will greatly enhance the computational efficiency of data–model integration to improve predictions and advance our understanding of the Earth system.},
doi = {10.5194/gmd-12-1791-2019},
journal = {Geoscientific Model Development (Online)},
number = 5,
volume = 12,
place = {United States},
year = {Mon May 06 00:00:00 EDT 2019},
month = {Mon May 06 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 19 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Schematic of sELM, with processes shown using blue boxes with dependencies on environmental data; eight uncertain parameter inputs are listed in orange ovals, and model state variables are indicated by green shapes. Parameters are input to one or more processes as indicated by blue arrows. Model state variablesmore » may be outputs for some processes and input for other processes as indicated by red arrows.« less

Save / Share:

Works referenced in this record:

Stochastic Gradient Descent Tricks
book, January 2012


Dimensionality Reduction for Complex Models via Bayesian Compressive Sensing
journal, January 2014


The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data
journal, October 2009


CH 4 parameter estimation in CLM4.5bgc using surrogate global optimization
journal, January 2015

  • Müller, J.; Paudel, R.; Shoemaker, C. A.
  • Geoscientific Model Development, Vol. 8, Issue 10
  • DOI: 10.5194/gmd-8-3285-2015

Calibration of the E3SM Land Model Using Surrogate-Based Global Optimization
journal, June 2018

  • Lu, Dan; Ricciuto, Daniel; Stoyanov, Miroslav
  • Journal of Advances in Modeling Earth Systems, Vol. 10, Issue 6
  • DOI: 10.1002/2017MS001134

Crop physiology calibration in the CLM
journal, January 2015

  • Bilionis, I.; Drewniak, B. A.; Constantinescu, E. M.
  • Geoscientific Model Development, Vol. 8, Issue 4
  • DOI: 10.5194/gmd-8-1071-2015

Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods
journal, January 2017


Assessment of probability density estimation methods: Parzen window and finite Gaussian mixtures
conference, January 2006

  • Archambeau, C.; Valle, M.; Assenza, A.
  • 2006 IEEE International Symposium on Circuits and Systems
  • DOI: 10.1109/ISCAS.2006.1693317

Review of surrogate modeling in water resources: REVIEW
journal, July 2012

  • Razavi, Saman; Tolson, Bryan A.; Burn, Donald H.
  • Water Resources Research, Vol. 48, Issue 7
  • DOI: 10.1029/2011WR011527

An improved analysis of forest carbon dynamics using data assimilation
journal, January 2005


Taking the Human Out of the Loop: A Review of Bayesian Optimization
journal, January 2016


Multi-objective parameter optimization of common land model using adaptive surrogate modeling
journal, January 2015


On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites: SURROGATE-BASED MCMC FOR CLM
journal, July 2016

  • Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan
  • Journal of Geophysical Research: Atmospheres, Vol. 121, Issue 13
  • DOI: 10.1002/2015JD024339

The Impact of Parametric Uncertainties on Biogeochemistry in the E3SM Land Model
journal, February 2018

  • Ricciuto, Daniel; Sargsyan, Khachik; Thornton, Peter
  • Journal of Advances in Modeling Earth Systems, Vol. 10, Issue 2
  • DOI: 10.1002/2017MS000962

Comparison of surrogate models with different methods in groundwater remediation process
journal, October 2014


Special Section on Multidisciplinary Design Optimization: Metamodeling in Multidisciplinary Design Optimization: How Far Have We Really Come?
journal, April 2014

  • Viana, Felipe A. C.; Simpson, Timothy W.; Balabanov, Vladimir
  • AIAA Journal, Vol. 52, Issue 4
  • DOI: 10.2514/1.J052375

Bayesian Calibration of the Community Land Model Using Surrogates
journal, January 2015

  • Ray, J.; Hou, Z.; Huang, M.
  • SIAM/ASA Journal on Uncertainty Quantification, Vol. 3, Issue 1
  • DOI: 10.1137/140957998

Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms
conference, January 2013


The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data
journal, October 2009


Voice conversion using Artificial Neural Networks
conference, April 2009

  • Desai, Srinivas; Raghavendra, E. Veera; Yegnanarayana, B.
  • 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • DOI: 10.1109/icassp.2009.4960478

Works referencing / citing this record:

DeepClimGAN: A High-Resolution Climate Data Generator
preprint, January 2020


Extending a land-surface model with Sphagnum moss to simulate responses of a northern temperate bog to whole ecosystem warming and elevated CO2
journal, January 2021

  • Shi, Xiaoying; Ricciuto, Daniel M.; Thornton, Peter E.
  • Biogeosciences, Vol. 18, Issue 2
  • DOI: 10.5194/bg-18-467-2021