skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Journal Article · · Biogeosciences (Online)
 [1];  [2];  [3];  [4];  [1]; ORCiD logo [1]
  1. Boston Univ., MA (United States). Department of Earth and Environment
  2. RK Analytics, Durham, NC (United States)
  3. Harvard Univ., Cambridge, MA (United States). Department Organismic and Evolutionary Biology
  4. Northern Arizona Univ., Flagstaff, AZ (United States). School of Informatics, Computing and Cyber Systems and Center for Ecosystem Science and Society

Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.

Research Organization:
Pennsylvania State Univ., University Park, PA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
FC02-06ER64157
OSTI ID:
1483479
Journal Information:
Biogeosciences (Online), Vol. 15, Issue 19; ISSN 1726-4189
Publisher:
European Geosciences UnionCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 56 works
Citation information provided by
Web of Science

References (63)

Tree mortality in the eastern and central United States: patterns and drivers journal July 2011
The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data journal October 2009
Carbon pools and fluxes in small temperate forest landscapes: Variability and implications for sampling design journal March 2010
Model-based analysis of the impact of diffuse radiation on CO2 exchange in a temperate deciduous forest journal February 2018
Using ecosystem experiments to improve vegetation models journal May 2015
Comparison of Gaussian process modeling software journal April 2018
Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease
  • Jandarov, Roman; Haran, Murali; Bjørnstad, Ottar
  • Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 63, Issue 3 https://doi.org/10.1111/rssc.12042
journal November 2013
Estimation of Community Land Model parameters for an improved assessment of net carbon fluxes at European sites: Estimation of CLM Parameters journal March 2017
Learning about physical parameters: the importance of model discrepancy journal October 2014
A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes: PEcAn/ED model-data uncertainty analysis journal March 2014
BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production journal January 2017
Parameter optimization for carbon and water fluxes in two global land surface models based on surrogate modelling: PARAMETER OPTIMIZATION FOR CARBON AND WATER FLUXES MODELLING journal January 2018
A Bayesian framework for model calibration, comparison and analysis: Application to four models for the biogeochemistry of a Norway spruce forest journal December 2011
A Bayesian calibration of a simple carbon cycle model: The role of observations in estimating and reducing uncertainty: BAYESIAN CARBON CYCLE MODEL CALIBRATION journal June 2008
Rate my data: quantifying the value of ecological data for the development of models of the terrestrial carbon cycle journal January 2013
A Predictive Framework to Understand Forest Responses to Global Change journal April 2009
On nearest-neighbor Gaussian process models for massive spatial data: Nearest-neighbor Gaussian process models journal August 2016
The value of soil respiration measurements for interpreting and modeling terrestrial carbon cycling journal November 2016
The multi-assumption architecture and testbed (MAAT v1.0): R code for generating ensembles with dynamic model structure and analysis of epistemic uncertainty from multiple sources journal January 2018
Consistent assimilation of multiple data streams in a carbon cycle data assimilation system journal January 2016
Towards a comprehensive assessment of model structural adequacy: ASSESSMENT OF MODEL STRUCTURAL ADEQUACY journal August 2012
Improving land surface models with FLUXNET data journal January 2009
Modeling the Terrestrial Biosphere journal October 2014
A multi-site analysis of random error in tower-based measurements of carbon and energy fluxes journal January 2006
Why environmental scientists are becoming Bayesians: Modelling with Bayes journal December 2004
Connecting dynamic vegetation models to data - an inverse perspective: Dynamic vegetation models - an inverse perspective journal August 2012
Uncertainties in CMIP5 Climate Projections due to Carbon Cycle Feedbacks journal January 2014
High-dimensional posterior exploration of hydrologic models using multiple-try DREAM (ZS) and high-performance computing : EFFICIENT MCMC FOR HIGH-DIMENSIONAL PROBLEMS journal January 2012
Calibration of Stochastic Computer Simulators Using Likelihood Emulation journal January 2017
Model-data synthesis in terrestrial carbon observation: methods, data requirements and data uncertainty specifications journal March 2005
Bayesian calibration of computer models journal August 2001
Approximately Sufficient Statistics and Bayesian Computation journal January 2008
OptIC project: An intercomparison of optimization techniques for parameter estimation in terrestrial biogeochemical models journal January 2007
On the Mathematical Foundations of Theoretical Statistics journal January 1922
Model-data synthesis of diurnal and seasonal CO2 fluxes at Niwot Ridge, Colorado journal February 2006
Quantifying uncertainty in the biospheric carbon flux for England and Wales journal August 2007
On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites: SURROGATE-BASED MCMC FOR CLM journal July 2016
Influences of observation errors in eddy flux data on inverse model parameter estimation journal January 2008
A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking journal January 2002
Recent advances in surrogate-based optimization journal January 2009
Learn-as-you-go acceleration of cosmological parameter estimates journal September 2015
Facilitating feedbacks between field measurements and ecosystem models journal May 2013
Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods journal January 2017
Design and Analysis of Computer Experiments journal November 1989
The model–data fusion pitfall: assuming certainty in an uncertain world journal September 2011
Estimating parameters of a forest ecosystem C model with measurements of stocks and fluxes as joint constraints journal April 2010
Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations journal February 2005
Prediction in ecology: a first-principles framework journal August 2017
Choosing the Sample Size of a Computer Experiment: A Practical Guide journal November 2009
An evaluation of adaptive surrogate modeling based optimization with two benchmark problems journal October 2014
An adaptive surrogate modeling-based sampling strategy for parameter optimization and distribution estimation (ASMO-PODE) journal September 2017
Mechanistic scaling of ecosystem function and dynamics in space and time: Ecosystem Demography model version 2 journal January 2009
Leveraging 35 years of Pinus taeda research in the southeastern US to constrain forest carbon cycle predictions: regional data assimilation using ecosystem experiments journal January 2017
Bayesian Methods for Quantifying and Reducing Uncertainty and Error in Forest Models journal September 2017
Bayesian Calibration of the Community Land Model Using Surrogates journal January 2015
An Adaptive Metropolis Algorithm journal April 2001
Influences of observation errors in eddy flux data on inverse model parameter estimation journal January 2008
Comparison of Gaussian process modeling software conference December 2016
Design and analysis of computer experiments conference September 1998
Design and analysis of computer experiments journal December 2010
Improving land surface models with FLUXNET data journal January 2009
Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease preprint January 2011
Comparison of Gaussian process modeling software preprint January 2017

Cited By (13)

What Limits Predictive Certainty of Long‐Term Carbon Uptake? journal December 2018
Parametric Controls on Vegetation Responses to Biogeochemical Forcing in the CLM5 journal September 2019
Improving plant allometry by fusing forest models and remote sensing journal April 2019
Plant profit maximization improves predictions of European forest responses to drought journal January 2020
A Statistical Model for Estimating Midday NDVI from the Geostationary Operational Environmental Satellite (GOES) 16 and 17 journal October 2019
Standardization Framework for Sustainability from Circular Economy 4.0 journal November 2019
Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system journal January 2019
Description and validation of an intermediate complexity model for ecosystem photosynthesis and evapotranspiration: ACM-GPP-ETv1 journal January 2019
The biophysics, ecology, and biogeochemistry of functionally diverse, vertically and horizontally heterogeneous ecosystems: the Ecosystem Demography model, version 2.2 – Part 2: Model evaluation for tropical South America journal January 2019
Developing and optimizing shrub parameters representing sagebrush (Artemisia spp.) ecosystems in the northern Great Basin using the Ecosystem Demography (EDv2.2) model journal January 2019
The Land Variational Ensemble Data Assimilation Framework: LAVENDAR v1.0.0 journal January 2020
A Near‐Term Iterative Forecasting System Successfully Predicts Reservoir Hydrodynamics and Partitions Uncertainty in Real Time journal November 2020
Ensemble-based retrospective analysis of the seasonal snowpack text January 2019

Similar Records

Development of an open-source regional data assimilation system in PEcAn v. 1.7.2: application to carbon cycle reanalysis across the contiguous US using SIPNET
Journal Article · Wed Apr 20 00:00:00 EDT 2022 · Geoscientific Model Development (Online) · OSTI ID:1483479

Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods
Journal Article · Wed Sep 27 00:00:00 EDT 2017 · Biogeosciences (Online) · OSTI ID:1483479

Bayesian calibration of terrestrial ecosystem models: A study of advanced Markov chain Monte Carlo methods
Journal Article · Wed Feb 22 00:00:00 EST 2017 · Biogeosciences Discussions (Online) · OSTI ID:1483479