DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-fidelity classification using Gaussian processes: Accelerating the prediction of large-scale computational models

Abstract

We report that machine learning techniques typically rely on large datasets to create accurate classifiers. However, there are situations when data is scarce and expensive to acquire. This is the case of studies that rely on state-of-the-art computational models which typically take days to run, thus hindering the potential of machine learning tools. In this work, we present a novel classifier that takes advantage of lower fidelity models and inexpensive approximations to predict the binary output of expensive computer simulations. We postulate an autoregressive model between the different levels of fidelity with Gaussian process priors. We adopt a fully Bayesian treatment for the hyper-parameters and use Markov Chain Monte Carlo samplers. We take advantage of the probabilistic nature of the classifier to implement active learning strategies. We also introduce a sparse approximation to enhance the ability of the multi-fidelity classifier to handle a large amount of low fidelity samples. We test these multi-fidelity classifiers against their single-fidelity counterpart with synthetic data, showing a median computational cost reduction of 23% for a target accuracy of 90%. In an application to cardiac electrophysiology, the multi-fidelity classifier achieves an F1 score, the harmonic mean of precision and recall, of 99.6% compared to 74.1%more » of a single-fidelity classifier when both are trained with 50 samples. In general, our results show that the multi-fidelity classifiers outperform their single-fidelity counterpart in terms of accuracy in all cases. Finally, we envision that this new tool will enable researchers to study classification problems that would otherwise be prohibitively expensive. Source code is available at https://github.com/fsahli/MFclass.« less

Authors:
 [1]; ORCiD logo [2];  [3];  [1]
  1. Pontificia Universidad CatĂłlica de Chile, Santiago (Chile). Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences; Millennium Nucleus for Cardiovascular Magnetic Resonance (Chile)
  2. Univ. of Pennsylvania, Philadelphia, PA (United States)
  3. Stanford Univ., CA (United States)
Publication Date:
Research Org.:
Univ. of Pennsylvania, Philadelphia, PA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1595796
Alternate Identifier(s):
OSTI ID: 1564273
Grant/Contract Number:  
SC0019116
Resource Type:
Accepted Manuscript
Journal Name:
Computer Methods in Applied Mechanics and Engineering
Additional Journal Information:
Journal Volume: 357; Journal Issue: C; Related Information: https://github.com/fsahli/MFclass; Journal ID: ISSN 0045-7825
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Machine learning; Bayesian inference; Hamiltonian Monte Carlo; Data-driven modeling; Cardiac electrophysiology

Citation Formats

Costabal, Francisco Sahli, Perdikaris, Paris, Kuhl, Ellen, and Hurtado, Daniel E. Multi-fidelity classification using Gaussian processes: Accelerating the prediction of large-scale computational models. United States: N. p., 2019. Web. doi:10.1016/j.cma.2019.112602.
Costabal, Francisco Sahli, Perdikaris, Paris, Kuhl, Ellen, & Hurtado, Daniel E. Multi-fidelity classification using Gaussian processes: Accelerating the prediction of large-scale computational models. United States. https://doi.org/10.1016/j.cma.2019.112602
Costabal, Francisco Sahli, Perdikaris, Paris, Kuhl, Ellen, and Hurtado, Daniel E. Fri . "Multi-fidelity classification using Gaussian processes: Accelerating the prediction of large-scale computational models". United States. https://doi.org/10.1016/j.cma.2019.112602. https://www.osti.gov/servlets/purl/1595796.
@article{osti_1595796,
title = {Multi-fidelity classification using Gaussian processes: Accelerating the prediction of large-scale computational models},
author = {Costabal, Francisco Sahli and Perdikaris, Paris and Kuhl, Ellen and Hurtado, Daniel E.},
abstractNote = {We report that machine learning techniques typically rely on large datasets to create accurate classifiers. However, there are situations when data is scarce and expensive to acquire. This is the case of studies that rely on state-of-the-art computational models which typically take days to run, thus hindering the potential of machine learning tools. In this work, we present a novel classifier that takes advantage of lower fidelity models and inexpensive approximations to predict the binary output of expensive computer simulations. We postulate an autoregressive model between the different levels of fidelity with Gaussian process priors. We adopt a fully Bayesian treatment for the hyper-parameters and use Markov Chain Monte Carlo samplers. We take advantage of the probabilistic nature of the classifier to implement active learning strategies. We also introduce a sparse approximation to enhance the ability of the multi-fidelity classifier to handle a large amount of low fidelity samples. We test these multi-fidelity classifiers against their single-fidelity counterpart with synthetic data, showing a median computational cost reduction of 23% for a target accuracy of 90%. In an application to cardiac electrophysiology, the multi-fidelity classifier achieves an F1 score, the harmonic mean of precision and recall, of 99.6% compared to 74.1% of a single-fidelity classifier when both are trained with 50 samples. In general, our results show that the multi-fidelity classifiers outperform their single-fidelity counterpart in terms of accuracy in all cases. Finally, we envision that this new tool will enable researchers to study classification problems that would otherwise be prohibitively expensive. Source code is available at https://github.com/fsahli/MFclass.},
doi = {10.1016/j.cma.2019.112602},
journal = {Computer Methods in Applied Mechanics and Engineering},
number = C,
volume = 357,
place = {United States},
year = {Fri Aug 30 00:00:00 EDT 2019},
month = {Fri Aug 30 00:00:00 EDT 2019}
}

Journal Article:

Citation Metrics:
Cited by: 34 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network
journal, January 2019


Computational fluid dynamics modelling in cardiovascular medicine
journal, October 2015


Propagation of material behavior uncertainty in a nonlinear finite element model of reconstructive surgery
journal, August 2018

  • Lee, Taeksang; Turin, Sergey Y.; Gosain, Arun K.
  • Biomechanics and Modeling in Mechanobiology, Vol. 17, Issue 6
  • DOI: 10.1007/s10237-018-1061-4

Machine learning in drug development: Characterizing the effect of 30 drugs on the QT interval using Gaussian process regression, sensitivity analysis, and uncertainty quantification
journal, May 2019

  • Sahli Costabal, Francisco; Matsuno, Kristen; Yao, Jiang
  • Computer Methods in Applied Mechanics and Engineering, Vol. 348
  • DOI: 10.1016/j.cma.2019.01.033

A generalized multi-resolution expansion for uncertainty propagation with application to cardiovascular modeling
journal, February 2017

  • Schiavazzi, D. E.; Doostan, A.; Iaccarino, G.
  • Computer Methods in Applied Mechanics and Engineering, Vol. 314
  • DOI: 10.1016/j.cma.2016.09.024

Towards efficient uncertainty quantification in complex and large-scale biomechanical problems based on a Bayesian multi-fidelity scheme
journal, September 2014

  • Biehler, Jonas; Gee, Michael W.; Wall, Wolfgang A.
  • Biomechanics and Modeling in Mechanobiology, Vol. 14, Issue 3
  • DOI: 10.1007/s10237-014-0618-0

Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization
journal, January 2018

  • Peherstorfer, Benjamin; Willcox, Karen; Gunzburger, Max
  • SIAM Review, Vol. 60, Issue 3
  • DOI: 10.1137/16M1082469

Multifidelity Monte Carlo Estimation of Variance and Sensitivity Indices
journal, January 2018

  • Qian, E.; Peherstorfer, B.; O'Malley, D.
  • SIAM/ASA Journal on Uncertainty Quantification, Vol. 6, Issue 2
  • DOI: 10.1137/17M1151006

Fast uncertainty quantification of activation sequences in patient-specific cardiac electrophysiology meeting clinical time constraints: Fast uncertainty quantification in cardiac electrophysiology
journal, April 2018

  • Quaglino, A.; Pezzuto, S.; Koutsourelakis, P. S.
  • International Journal for Numerical Methods in Biomedical Engineering, Vol. 34, Issue 7
  • DOI: 10.1002/cnm.2985

A multi-resolution, non-parametric, Bayesian framework for identification of spatially-varying model parameters
journal, September 2009


Multi-fidelity Gaussian process regression for prediction of random fields
journal, May 2017


Multifidelity Information Fusion Algorithms for High-Dimensional Systems and Massive Data sets
journal, January 2016

  • Perdikaris, Paris; Venturi, Daniele; Karniadakis, George Em
  • SIAM Journal on Scientific Computing, Vol. 38, Issue 4
  • DOI: 10.1137/15M1055164

Spatial and temporal organization during cardiac fibrillation
journal, March 1998

  • Gray, Richard A.; Pertsov, Arkady M.; Jalife, JosĂ©
  • Nature, Vol. 392, Issue 6671
  • DOI: 10.1038/32164

A mechanical model predicts morphological abnormalities in the developing human brain
journal, July 2014

  • Budday, Silvia; Raybaud, Charles; Kuhl, Ellen
  • Scientific Reports, Vol. 4, Issue 1
  • DOI: 10.1038/srep05644

Instabilities of soft films on compliant substrates
journal, January 2017


Particle Learning of Gaussian Process Models for Sequential Design and Optimization
journal, January 2011

  • Gramacy, Robert B.; Polson, Nicholas G.
  • Journal of Computational and Graphical Statistics, Vol. 20, Issue 1
  • DOI: 10.1198/jcgs.2010.09171

Probabilistic programming in Python using PyMC3
journal, January 2016

  • Salvatier, John; Wiecki, Thomas V.; Fonnesbeck, Christopher
  • PeerJ Computer Science, Vol. 2
  • DOI: 10.7717/peerj-cs.55

Active Learning with Statistical Models
journal, January 1996

  • Cohn, D. A.; Ghahramani, Z.; Jordan, M. I.
  • Journal of Artificial Intelligence Research, Vol. 4
  • DOI: 10.1613/jair.295

Large Sample Properties of Simulations Using Latin Hypercube Sampling
journal, May 1987


Individual Comparisons by Ranking Methods
journal, December 1945

  • Wilcoxon, Frank
  • Biometrics Bulletin, Vol. 1, Issue 6
  • DOI: 10.2307/3001968

On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other
journal, March 1947


A simple two-variable model of cardiac excitation
journal, March 1996


Interpreting Activation Mapping of Atrial Fibrillation: A Hybrid Computational/Physiological Study
journal, December 2017

  • Sahli Costabal, Francisco; Zaman, Junaid A. B.; Kuhl, Ellen
  • Annals of Biomedical Engineering, Vol. 46, Issue 2
  • DOI: 10.1007/s10439-017-1969-3

Generating Purkinje networks in the human heart
journal, August 2016


Computational modelling of electrocardiograms: repolarisation and T-wave polarity in the human heart
journal, October 2012

  • Hurtado, Daniel E.; Kuhl, Ellen
  • Computer Methods in Biomechanics and Biomedical Engineering, Vol. 17, Issue 9
  • DOI: 10.1080/10255842.2012.729582

A multi-resolution, non-parametric, Bayesian framework for identification of spatially-varying model parameters
journal, September 2009


Multi-fidelity Gaussian process regression for prediction of random fields
journal, May 2017


Erratum: Spatial and temporal organization during cardiac fibrillation
journal, May 1998

  • Gray, Richard A.; Pertsov, Arkady M.; Jalife, JosĂ©
  • Nature, Vol. 393, Issue 6681
  • DOI: 10.1038/30290

Publisher Correction: Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network
journal, January 2019


Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling
journal, February 2017

  • Perdikaris, P.; Raissi, M.; Damianou, A.
  • Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 473, Issue 2198
  • DOI: 10.1098/rspa.2016.0751

Scalable Variational Gaussian Process Classification
preprint, January 2014


Survey of multifidelity methods in uncertainty propagation, inference, and optimization
preprint, January 2018


Works referencing / citing this record:

Multiscale Modeling Meets Machine Learning: What Can We Learn?
journal, February 2020

  • Peng, Grace C. Y.; Alber, Mark; Buganza Tepole, Adrian
  • Archives of Computational Methods in Engineering
  • DOI: 10.1007/s11831-020-09405-5

Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences
journal, November 2019

  • Alber, Mark; Buganza Tepole, Adrian; Cannon, William R.
  • npj Digital Medicine, Vol. 2, Issue 1
  • DOI: 10.1038/s41746-019-0193-y

Multiscale modeling meets machine learning: What can we learn?
preprint, January 2019