Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Journal Article · · Biogeosciences (Online)
 [1];  [2];  [3];  [4];  [5];  [5]
  1. Boston Univ., MA (United States). Department of Earth and Environment; Boston University
  2. RK Analytics, Durham, NC (United States)
  3. Harvard Univ., Cambridge, MA (United States). Department Organismic and Evolutionary Biology
  4. Northern Arizona Univ., Flagstaff, AZ (United States). School of Informatics, Computing and Cyber Systems and Center for Ecosystem Science and Society
  5. Boston Univ., MA (United States). Department of Earth and Environment

Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.

Research Organization:
Pennsylvania State Univ., University Park, PA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
FC02-06ER64157
OSTI ID:
1483479
Journal Information:
Biogeosciences (Online), Journal Name: Biogeosciences (Online) Journal Issue: 19 Vol. 15; ISSN 1726-4189
Publisher:
European Geosciences UnionCopyright Statement
Country of Publication:
United States
Language:
English

References (66)

Calibration of Stochastic Computer Simulators Using Likelihood Emulation dataset January 2015
A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes: PEcAn/ED model-data uncertainty analysis journal March 2014
On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites: SURROGATE-BASED MCMC FOR CLM journal July 2016
Estimation of Community Land Model parameters for an improved assessment of net carbon fluxes at European sites: Estimation of CLM Parameters journal March 2017
Prediction in ecology: a first-principles framework journal August 2017
Parameter optimization for carbon and water fluxes in two global land surface models based on surrogate modelling: PARAMETER OPTIMIZATION FOR CARBON AND WATER FLUXES MODELLING journal January 2018
On nearest-neighbor Gaussian process models for massive spatial data: Nearest-neighbor Gaussian process models journal August 2016
On the Mathematical Foundations of Theoretical Statistics book January 1992
Estimating parameters of a forest ecosystem C model with measurements of stocks and fluxes as joint constraints journal April 2010
The model–data fusion pitfall: assuming certainty in an uncertain world journal September 2011
Design and analysis of computer experiments journal December 2010
The value of soil respiration measurements for interpreting and modeling terrestrial carbon cycling journal November 2016
Bayesian Methods for Quantifying and Reducing Uncertainty and Error in Forest Models journal September 2017
A multi-site analysis of random error in tower-based measurements of carbon and energy fluxes journal January 2006
The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data journal October 2009
A Bayesian framework for model calibration, comparison and analysis: Application to four models for the biogeochemistry of a Norway spruce forest journal December 2011
Model-based analysis of the impact of diffuse radiation on CO2 exchange in a temperate deciduous forest journal February 2018
Comparison of Gaussian process modeling software journal April 2018
An evaluation of adaptive surrogate modeling based optimization with two benchmark problems journal October 2014
An adaptive surrogate modeling-based sampling strategy for parameter optimization and distribution estimation (ASMO-PODE) journal September 2017
Carbon pools and fluxes in small temperate forest landscapes: Variability and implications for sampling design journal March 2010
Recent advances in surrogate-based optimization journal January 2009
A Bayesian calibration of a simple carbon cycle model: The role of observations in estimating and reducing uncertainty: BAYESIAN CARBON CYCLE MODEL CALIBRATION journal June 2008
OptIC project: An intercomparison of optimization techniques for parameter estimation in terrestrial biogeochemical models journal January 2007
Mechanistic scaling of ecosystem function and dynamics in space and time: Ecosystem Demography model version 2 journal January 2009
High-dimensional posterior exploration of hydrologic models using multiple-try DREAM (ZS) and high-performance computing : EFFICIENT MCMC FOR HIGH-DIMENSIONAL PROBLEMS journal January 2012
Towards a comprehensive assessment of model structural adequacy: ASSESSMENT OF MODEL STRUCTURAL ADEQUACY journal August 2012
Using ecosystem experiments to improve vegetation models journal May 2015
Calibration of Stochastic Computer Simulators Using Likelihood Emulation journal January 2017
Learning about physical parameters: the importance of model discrepancy journal October 2014
Learn-as-you-go acceleration of cosmological parameter estimates journal September 2015
On the Mathematical Foundations of Theoretical Statistics journal January 1922
A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking journal January 2002
Comparison of Gaussian process modeling software conference December 2016
Bayesian calibration of computer models journal August 2001
BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production journal January 2017
Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations journal February 2005
Model-data synthesis in terrestrial carbon observation: methods, data requirements and data uncertainty specifications journal March 2005
Model-data synthesis of diurnal and seasonal CO2 fluxes at Niwot Ridge, Colorado journal February 2006
Tree mortality in the eastern and central United States: patterns and drivers journal July 2011
Connecting dynamic vegetation models to data - an inverse perspective: Dynamic vegetation models - an inverse perspective journal August 2012
Why environmental scientists are becoming Bayesians: Modelling with Bayes journal December 2004
Quantifying uncertainty in the biospheric carbon flux for England and Wales journal August 2007
A Predictive Framework to Understand Forest Responses to Global Change journal April 2009
Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease
  • Jandarov, Roman; Haran, Murali; Bjørnstad, Ottar
  • Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 63, Issue 3 https://doi.org/10.1111/rssc.12042
journal November 2013
Bayesian Calibration of the Community Land Model Using Surrogates journal January 2015
Modeling the Terrestrial Biosphere journal October 2014
Uncertainties in CMIP5 Climate Projections due to Carbon Cycle Feedbacks journal January 2014
Choosing the Sample Size of a Computer Experiment: A Practical Guide journal November 2009
Design and Analysis of Computer Experiments journal November 1989
Facilitating feedbacks between field measurements and ecosystem models journal May 2013
Rate my data: quantifying the value of ecological data for the development of models of the terrestrial carbon cycle journal January 2013
Approximately Sufficient Statistics and Bayesian Computation journal January 2008
An Adaptive Metropolis Algorithm journal April 2001
Design and analysis of computer experiments conference September 1998
Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease preprint January 2011
Comparison of Gaussian process modeling software preprint January 2017
Leveraging 35 years of Pinus taeda research in the southeastern US to constrain forest carbon cycle predictions: regional data assimilation using ecosystem experiments journal January 2017
Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods journal January 2017
Influences of observation errors in eddy flux data on inverse model parameter estimation journal January 2008
Improving land surface models with FLUXNET data journal January 2009
Influences of observation errors in eddy flux data on inverse model parameter estimation journal January 2008
Improving land surface models with FLUXNET data journal January 2009
The multi-assumption architecture and testbed (MAAT v1.0): R code for generating ensembles with dynamic model structure and analysis of epistemic uncertainty from multiple sources journal January 2018
Consistent assimilation of multiple data streams in a carbon cycle data assimilation system journal January 2016
Facilitating feedbacks between field measurements and ecosystem models collection January 2016

Cited By (13)

Improving plant allometry by fusing forest models and remote sensing journal April 2019
Ensemble-based retrospective analysis of the seasonal snowpack text January 2019
What Limits Predictive Certainty of Long‐Term Carbon Uptake? journal December 2018
Parametric Controls on Vegetation Responses to Biogeochemical Forcing in the CLM5 journal September 2019
A Near‐Term Iterative Forecasting System Successfully Predicts Reservoir Hydrodynamics and Partitions Uncertainty in Real Time journal November 2020
Plant profit maximization improves predictions of European forest responses to drought journal January 2020
A Statistical Model for Estimating Midday NDVI from the Geostationary Operational Environmental Satellite (GOES) 16 and 17 journal October 2019
Standardization Framework for Sustainability from Circular Economy 4.0 journal November 2019
Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system journal January 2019
Description and validation of an intermediate complexity model for ecosystem photosynthesis and evapotranspiration: ACM-GPP-ETv1 journal January 2019
The biophysics, ecology, and biogeochemistry of functionally diverse, vertically and horizontally heterogeneous ecosystems: the Ecosystem Demography model, version 2.2 – Part 2: Model evaluation for tropical South America journal January 2019
Developing and optimizing shrub parameters representing sagebrush (Artemisia spp.) ecosystems in the northern Great Basin using the Ecosystem Demography (EDv2.2) model journal January 2019
The Land Variational Ensemble Data Assimilation Framework: LAVENDAR v1.0.0 journal January 2020

Similar Records

Development of an open-source regional data assimilation system in PEcAn v. 1.7.2: application to carbon cycle reanalysis across the contiguous US using SIPNET
Journal Article · Wed Apr 20 00:00:00 EDT 2022 · Geoscientific Model Development (Online) · OSTI ID:1864062

Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods
Journal Article · Wed Sep 27 00:00:00 EDT 2017 · Biogeosciences (Online) · OSTI ID:1399968