DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Abstract

Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparisonmore » of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.« less

Authors:
 [1];  [2];  [3];  [4];  [1]; ORCiD logo [1]
  1. Boston Univ., MA (United States). Department of Earth and Environment
  2. RK Analytics, Durham, NC (United States)
  3. Harvard Univ., Cambridge, MA (United States). Department Organismic and Evolutionary Biology
  4. Northern Arizona Univ., Flagstaff, AZ (United States). School of Informatics, Computing and Cyber Systems and Center for Ecosystem Science and Society
Publication Date:
Research Org.:
Pennsylvania State Univ., University Park, PA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1483479
Grant/Contract Number:  
FC02-06ER64157
Resource Type:
Accepted Manuscript
Journal Name:
Biogeosciences (Online)
Additional Journal Information:
Journal Name: Biogeosciences (Online); Journal Volume: 15; Journal Issue: 19; Journal ID: ISSN 1726-4189
Publisher:
European Geosciences Union
Country of Publication:
United States
Language:
English
Subject:
58 GEOSCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Fer, Istem, Kelly, Ryan, Moorcroft, Paul R., Richardson, Andrew D., Cowdery, Elizabeth M., and Dietze, Michael C. Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation. United States: N. p., 2018. Web. doi:10.5194/bg-15-5801-2018.
Fer, Istem, Kelly, Ryan, Moorcroft, Paul R., Richardson, Andrew D., Cowdery, Elizabeth M., & Dietze, Michael C. Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation. United States. https://doi.org/10.5194/bg-15-5801-2018
Fer, Istem, Kelly, Ryan, Moorcroft, Paul R., Richardson, Andrew D., Cowdery, Elizabeth M., and Dietze, Michael C. Thu . "Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation". United States. https://doi.org/10.5194/bg-15-5801-2018. https://www.osti.gov/servlets/purl/1483479.
@article{osti_1483479,
title = {Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation},
author = {Fer, Istem and Kelly, Ryan and Moorcroft, Paul R. and Richardson, Andrew D. and Cowdery, Elizabeth M. and Dietze, Michael C.},
abstractNote = {Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.},
doi = {10.5194/bg-15-5801-2018},
journal = {Biogeosciences (Online)},
number = 19,
volume = 15,
place = {United States},
year = {Thu Oct 04 00:00:00 EDT 2018},
month = {Thu Oct 04 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 56 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Tree mortality in the eastern and central United States: patterns and drivers
journal, July 2011


The REFLEX project: Comparing different algorithms and implementations for the inversion of a terrestrial ecosystem model against eddy covariance data
journal, October 2009


Carbon pools and fluxes in small temperate forest landscapes: Variability and implications for sampling design
journal, March 2010

  • Bradford, John B.; Weishampel, Peter; Smith, Marie-Louise
  • Forest Ecology and Management, Vol. 259, Issue 7
  • DOI: 10.1016/j.foreco.2009.04.009

Model-based analysis of the impact of diffuse radiation on CO2 exchange in a temperate deciduous forest
journal, February 2018


Using ecosystem experiments to improve vegetation models
journal, May 2015

  • Medlyn, Belinda E.; Zaehle, Sönke; De Kauwe, Martin G.
  • Nature Climate Change, Vol. 5, Issue 6
  • DOI: 10.1038/nclimate2621

Comparison of Gaussian process modeling software
journal, April 2018

  • Erickson, Collin B.; Ankenman, Bruce E.; Sanchez, Susan M.
  • European Journal of Operational Research, Vol. 266, Issue 1
  • DOI: 10.1016/j.ejor.2017.10.002

Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease
journal, November 2013

  • Jandarov, Roman; Haran, Murali; Bjørnstad, Ottar
  • Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 63, Issue 3
  • DOI: 10.1111/rssc.12042

Estimation of Community Land Model parameters for an improved assessment of net carbon fluxes at European sites: Estimation of CLM Parameters
journal, March 2017

  • Post, Hanna; Vrugt, Jasper A.; Fox, Andrew
  • Journal of Geophysical Research: Biogeosciences, Vol. 122, Issue 3
  • DOI: 10.1002/2015JG003297

Learning about physical parameters: the importance of model discrepancy
journal, October 2014


A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes: PEcAn/ED model-data uncertainty analysis
journal, March 2014

  • Dietze, Michael C.; Serbin, Shawn P.; Davidson, Carl
  • Journal of Geophysical Research: Biogeosciences, Vol. 119, Issue 3
  • DOI: 10.1002/2013JG002392

BETYdb: a yield, trait, and ecosystem service database applied to second-generation bioenergy feedstock production
journal, January 2017

  • LeBauer, David; Kooper, Rob; Mulrooney, Patrick
  • GCB Bioenergy, Vol. 10, Issue 1
  • DOI: 10.1111/gcbb.12420

A Bayesian framework for model calibration, comparison and analysis: Application to four models for the biogeochemistry of a Norway spruce forest
journal, December 2011


A Bayesian calibration of a simple carbon cycle model: The role of observations in estimating and reducing uncertainty: BAYESIAN CARBON CYCLE MODEL CALIBRATION
journal, June 2008

  • Ricciuto, Daniel M.; Davis, Kenneth J.; Keller, Klaus
  • Global Biogeochemical Cycles, Vol. 22, Issue 2
  • DOI: 10.1029/2006GB002908

Rate my data: quantifying the value of ecological data for the development of models of the terrestrial carbon cycle
journal, January 2013

  • Keenan, Trevor F.; Davidson, Eric A.; Munger, J. William
  • Ecological Applications, Vol. 23, Issue 1
  • DOI: 10.1890/12-0747.1

A Predictive Framework to Understand Forest Responses to Global Change
journal, April 2009


On nearest-neighbor Gaussian process models for massive spatial data: Nearest-neighbor Gaussian process models
journal, August 2016

  • Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.
  • Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 8, Issue 5
  • DOI: 10.1002/wics.1383

The value of soil respiration measurements for interpreting and modeling terrestrial carbon cycling
journal, November 2016

  • Phillips, Claire L.; Bond-Lamberty, Ben; Desai, Ankur R.
  • Plant and Soil, Vol. 413, Issue 1-2
  • DOI: 10.1007/s11104-016-3084-x

Consistent assimilation of multiple data streams in a carbon cycle data assimilation system
journal, January 2016

  • MacBean, Natasha; Peylin, Philippe; Chevallier, Frédéric
  • Geoscientific Model Development, Vol. 9, Issue 10
  • DOI: 10.5194/gmd-9-3569-2016

Towards a comprehensive assessment of model structural adequacy: ASSESSMENT OF MODEL STRUCTURAL ADEQUACY
journal, August 2012

  • Gupta, Hoshin V.; Clark, Martyn P.; Vrugt, Jasper A.
  • Water Resources Research, Vol. 48, Issue 8
  • DOI: 10.1029/2011WR011044

Improving land surface models with FLUXNET data
journal, January 2009


Modeling the Terrestrial Biosphere
journal, October 2014


A multi-site analysis of random error in tower-based measurements of carbon and energy fluxes
journal, January 2006

  • Richardson, Andrew D.; Hollinger, David Y.; Burba, George G.
  • Agricultural and Forest Meteorology, Vol. 136, Issue 1-2
  • DOI: 10.1016/j.agrformet.2006.01.007

Why environmental scientists are becoming Bayesians: Modelling with Bayes
journal, December 2004


Connecting dynamic vegetation models to data - an inverse perspective: Dynamic vegetation models - an inverse perspective
journal, August 2012


Uncertainties in CMIP5 Climate Projections due to Carbon Cycle Feedbacks
journal, January 2014

  • Friedlingstein, Pierre; Meinshausen, Malte; Arora, Vivek K.
  • Journal of Climate, Vol. 27, Issue 2
  • DOI: 10.1175/JCLI-D-12-00579.1

Calibration of Stochastic Computer Simulators Using Likelihood Emulation
journal, January 2017


Model-data synthesis in terrestrial carbon observation: methods, data requirements and data uncertainty specifications
journal, March 2005


Bayesian calibration of computer models
journal, August 2001

  • Kennedy, Marc C.; O'Hagan, Anthony
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, Issue 3
  • DOI: 10.1111/1467-9868.00294

Approximately Sufficient Statistics and Bayesian Computation
journal, January 2008

  • Joyce, Paul; Marjoram, Paul
  • Statistical Applications in Genetics and Molecular Biology, Vol. 7, Issue 1
  • DOI: 10.2202/1544-6115.1389

OptIC project: An intercomparison of optimization techniques for parameter estimation in terrestrial biogeochemical models
journal, January 2007

  • Trudinger, Cathy M.; Raupach, Michael R.; Rayner, Peter J.
  • Journal of Geophysical Research, Vol. 112, Issue G2
  • DOI: 10.1029/2006JG000367

On the Mathematical Foundations of Theoretical Statistics
journal, January 1922

  • Fisher, R. A.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 222, Issue 594-604
  • DOI: 10.1098/rsta.1922.0009

Model-data synthesis of diurnal and seasonal CO2 fluxes at Niwot Ridge, Colorado
journal, February 2006


Quantifying uncertainty in the biospheric carbon flux for England and Wales
journal, August 2007

  • Kennedy, Marc; Anderson, Clive; O'Hagan, Anthony
  • Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 0, Issue 0
  • DOI: 10.1111/j.1467-985X.2007.00489.x

On the applicability of surrogate-based Markov chain Monte Carlo-Bayesian inversion to the Community Land Model: Case studies at flux tower sites: SURROGATE-BASED MCMC FOR CLM
journal, July 2016

  • Huang, Maoyi; Ray, Jaideep; Hou, Zhangshuan
  • Journal of Geophysical Research: Atmospheres, Vol. 121, Issue 13
  • DOI: 10.1002/2015JD024339

Influences of observation errors in eddy flux data on inverse model parameter estimation
journal, January 2008


A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking
journal, January 2002

  • Arulampalam, M. S.; Maskell, S.; Gordon, N.
  • IEEE Transactions on Signal Processing, Vol. 50, Issue 2
  • DOI: 10.1109/78.978374

Recent advances in surrogate-based optimization
journal, January 2009


Learn-as-you-go acceleration of cosmological parameter estimates
journal, September 2015

  • Aslanyan, Grigor; Easther, Richard; Price, Layne C.
  • Journal of Cosmology and Astroparticle Physics, Vol. 2015, Issue 09
  • DOI: 10.1088/1475-7516/2015/09/005

Facilitating feedbacks between field measurements and ecosystem models
journal, May 2013

  • LeBauer, David S.; Wang, Dan; Richter, Katherine T.
  • Ecological Monographs, Vol. 83, Issue 2
  • DOI: 10.1890/12-0137.1

Bayesian calibration of terrestrial ecosystem models: a study of advanced Markov chain Monte Carlo methods
journal, January 2017


Design and Analysis of Computer Experiments
journal, November 1989

  • Sacks, Jerome; Welch, William J.; Mitchell, Toby J.
  • Statistical Science, Vol. 4, Issue 4
  • DOI: 10.1214/ss/1177012413

The model–data fusion pitfall: assuming certainty in an uncertain world
journal, September 2011


Estimating parameters of a forest ecosystem C model with measurements of stocks and fluxes as joint constraints
journal, April 2010


Estimating diurnal to annual ecosystem parameters by synthesis of a carbon flux model with eddy covariance net ecosystem exchange observations
journal, February 2005


Prediction in ecology: a first-principles framework
journal, August 2017

  • Dietze, Michael C.
  • Ecological Applications, Vol. 27, Issue 7
  • DOI: 10.1002/eap.1589

Choosing the Sample Size of a Computer Experiment: A Practical Guide
journal, November 2009


An evaluation of adaptive surrogate modeling based optimization with two benchmark problems
journal, October 2014


An adaptive surrogate modeling-based sampling strategy for parameter optimization and distribution estimation (ASMO-PODE)
journal, September 2017


Mechanistic scaling of ecosystem function and dynamics in space and time: Ecosystem Demography model version 2
journal, January 2009

  • Medvigy, D.; Wofsy, S. C.; Munger, J. W.
  • Journal of Geophysical Research, Vol. 114, Issue G1
  • DOI: 10.1029/2008JG000812

Bayesian Methods for Quantifying and Reducing Uncertainty and Error in Forest Models
journal, September 2017


Bayesian Calibration of the Community Land Model Using Surrogates
journal, January 2015

  • Ray, J.; Hou, Z.; Huang, M.
  • SIAM/ASA Journal on Uncertainty Quantification, Vol. 3, Issue 1
  • DOI: 10.1137/140957998

An Adaptive Metropolis Algorithm
journal, April 2001

  • Haario, Heikki; Saksman, Eero; Tamminen, Johanna
  • Bernoulli, Vol. 7, Issue 2
  • DOI: 10.2307/3318737

Influences of observation errors in eddy flux data on inverse model parameter estimation
journal, January 2008

  • Lasslop, G.; Reichstein, M.; Kattge, J.
  • Biogeosciences Discussions, Vol. 5, Issue 1
  • DOI: 10.5194/bgd-5-751-2008

Comparison of Gaussian process modeling software
conference, December 2016

  • Erickson, Collin; Ankenman, Bruce E.; Sanchez, Susan M.
  • 2016 Winter Simulation Conference (WSC)
  • DOI: 10.1109/wsc.2016.7822403

Design and analysis of computer experiments
conference, September 1998

  • Booker, Andrew
  • 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization
  • DOI: 10.2514/6.1998-4757

Design and analysis of computer experiments
journal, December 2010


Improving land surface models with FLUXNET data
journal, January 2009

  • Williams, M.; Richardson, A. D.; Reichstein, M.
  • Biogeosciences Discussions, Vol. 6, Issue 2
  • DOI: 10.5194/bgd-6-2785-2009

Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease
preprint, January 2011


Comparison of Gaussian process modeling software
preprint, January 2017


Works referencing / citing this record:

What Limits Predictive Certainty of Long‐Term Carbon Uptake?
journal, December 2018

  • Raczka, Brett; Dietze, Michael C.; Serbin, Shawn P.
  • Journal of Geophysical Research: Biogeosciences, Vol. 123, Issue 12
  • DOI: 10.1029/2018jg004504

Parametric Controls on Vegetation Responses to Biogeochemical Forcing in the CLM5
journal, September 2019

  • Fisher, Rosie A.; Wieder, William R.; Sanderson, Benjamin M.
  • Journal of Advances in Modeling Earth Systems, Vol. 11, Issue 9
  • DOI: 10.1029/2019ms001609

Improving plant allometry by fusing forest models and remote sensing
journal, April 2019

  • Fischer, Fabian Jörg; Maréchaux, Isabelle; Chave, Jérôme
  • New Phytologist, Vol. 223, Issue 3
  • DOI: 10.1111/nph.15810

Plant profit maximization improves predictions of European forest responses to drought
journal, January 2020

  • Sabot, Manon E. B.; De Kauwe, Martin G.; Pitman, Andy J.
  • New Phytologist, Vol. 226, Issue 6
  • DOI: 10.1111/nph.16376

A Statistical Model for Estimating Midday NDVI from the Geostationary Operational Environmental Satellite (GOES) 16 and 17
journal, October 2019

  • Wheeler, Kathryn I.; Dietze, Michael C.
  • Remote Sensing, Vol. 11, Issue 21
  • DOI: 10.3390/rs11212507

Standardization Framework for Sustainability from Circular Economy 4.0
journal, November 2019

  • Ávila-Gutiérrez, María Jesús; Martín-Gómez, Alejandro; Aguayo-González, Francisco
  • Sustainability, Vol. 11, Issue 22
  • DOI: 10.3390/su11226490

Evaluation of terrestrial pan-Arctic carbon cycling using a data-assimilation system
journal, January 2019

  • López-Blanco, Efrén; Exbrayat, Jean-François; Lund, Magnus
  • Earth System Dynamics, Vol. 10, Issue 2
  • DOI: 10.5194/esd-10-233-2019

Description and validation of an intermediate complexity model for ecosystem photosynthesis and evapotranspiration: ACM-GPP-ETv1
journal, January 2019

  • Smallman, Thomas Luke; Williams, Mathew
  • Geoscientific Model Development, Vol. 12, Issue 6
  • DOI: 10.5194/gmd-12-2227-2019

Developing and optimizing shrub parameters representing sagebrush (Artemisia spp.) ecosystems in the northern Great Basin using the Ecosystem Demography (EDv2.2) model
journal, January 2019

  • Pandit, Karun; Dashti, Hamid; Glenn, Nancy F.
  • Geoscientific Model Development, Vol. 12, Issue 11
  • DOI: 10.5194/gmd-12-4585-2019

The Land Variational Ensemble Data Assimilation Framework: LAVENDAR v1.0.0
journal, January 2020

  • Pinnington, Ewan; Quaife, Tristan; Lawless, Amos
  • Geoscientific Model Development, Vol. 13, Issue 1
  • DOI: 10.5194/gmd-13-55-2020

A Near‐Term Iterative Forecasting System Successfully Predicts Reservoir Hydrodynamics and Partitions Uncertainty in Real Time
journal, November 2020

  • Thomas, R. Quinn; Figueiredo, Renato J.; Daneshmand, Vahid
  • Water Resources Research, Vol. 56, Issue 11
  • DOI: 10.1029/2019wr026138

Improving plant allometry by fusing forest models and remote sensing
journal, April 2019

  • Fischer, Fabian Jörg; Maréchaux, Isabelle; Chave, Jérôme
  • New Phytologist, Vol. 223, Issue 3
  • DOI: 10.1111/nph.15810

Ensemble-based retrospective analysis of the seasonal snowpack
text, January 2019