DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models

Abstract

The widespread adoption of electronic medical records (EMRs) in healthcare has provided vast new amounts of data for statistical machine learning researchers in their efforts to model and predict patient health status, potentially enabling novel advances in treatment. In the case of sepsis, a debilitating, dysregulated host response to infection, extracting subtle, uncataloged clinical phenotypes from the EMR with statistical machine learning methods has the potential to impact patient diagnosis and treatment early in the course of their hospitalization. However, there are significant barriers that must be overcome to extract these insights from EMR data. First, EMR datasets consist of both static and dynamic observations of discrete and continuous-valued variables, many of which may be missing, precluding the application of standard multivariate analysis techniques. Second, clinical populations observed via EMRs and relevant to the study and management of conditions like sepsis are often heterogeneous; properly accounting for this heterogeneity is critical. Here, we describe an unsupervised, probabilistic framework called a composite mixture model that can simultaneously accommodate the wide variety of observations frequently observed in EMR datasets, characterize heterogeneous clinical populations, and handle missing observations. In conclusion, we demonstrate the efficacy of our approach on a large-scale sepsis cohort, developingmore » novel techniques built on our model-based clusters to track patient mortality risk over time and identify physiological trends and distinct subgroups of the dataset associated with elevated risk of mortality during hospitalization.« less

Authors:
 [1];  [1];  [1];  [2];  [2];  [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  2. Kaiser Permanente Northern California, Oakland, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1477828
Report Number(s):
LLNL-JRNL-730845
Journal ID: ISSN 1532-0464; 881645
Grant/Contract Number:  
AC52-07NA27344
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Biomedical Informatics
Additional Journal Information:
Journal Volume: 78; Journal Issue: C; Journal ID: ISSN 1532-0464
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Electronic health records; Mixture modeling; Risk stratification; Sepsis; Composite mixture model; Cluster analysis

Citation Formats

Mayhew, Michael B., Petersen, Brenden K., Sales, Ana Paula, Greene, John D., Liu, Vincent X., and Wasson, Todd S. Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models. United States: N. p., 2017. Web. doi:10.1016/j.jbi.2017.11.015.
Mayhew, Michael B., Petersen, Brenden K., Sales, Ana Paula, Greene, John D., Liu, Vincent X., & Wasson, Todd S. Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models. United States. https://doi.org/10.1016/j.jbi.2017.11.015
Mayhew, Michael B., Petersen, Brenden K., Sales, Ana Paula, Greene, John D., Liu, Vincent X., and Wasson, Todd S. Sat . "Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models". United States. https://doi.org/10.1016/j.jbi.2017.11.015. https://www.osti.gov/servlets/purl/1477828.
@article{osti_1477828,
title = {Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models},
author = {Mayhew, Michael B. and Petersen, Brenden K. and Sales, Ana Paula and Greene, John D. and Liu, Vincent X. and Wasson, Todd S.},
abstractNote = {The widespread adoption of electronic medical records (EMRs) in healthcare has provided vast new amounts of data for statistical machine learning researchers in their efforts to model and predict patient health status, potentially enabling novel advances in treatment. In the case of sepsis, a debilitating, dysregulated host response to infection, extracting subtle, uncataloged clinical phenotypes from the EMR with statistical machine learning methods has the potential to impact patient diagnosis and treatment early in the course of their hospitalization. However, there are significant barriers that must be overcome to extract these insights from EMR data. First, EMR datasets consist of both static and dynamic observations of discrete and continuous-valued variables, many of which may be missing, precluding the application of standard multivariate analysis techniques. Second, clinical populations observed via EMRs and relevant to the study and management of conditions like sepsis are often heterogeneous; properly accounting for this heterogeneity is critical. Here, we describe an unsupervised, probabilistic framework called a composite mixture model that can simultaneously accommodate the wide variety of observations frequently observed in EMR datasets, characterize heterogeneous clinical populations, and handle missing observations. In conclusion, we demonstrate the efficacy of our approach on a large-scale sepsis cohort, developing novel techniques built on our model-based clusters to track patient mortality risk over time and identify physiological trends and distinct subgroups of the dataset associated with elevated risk of mortality during hospitalization.},
doi = {10.1016/j.jbi.2017.11.015},
journal = {Journal of Biomedical Informatics},
number = C,
volume = 78,
place = {United States},
year = {Sat Dec 02 00:00:00 EST 2017},
month = {Sat Dec 02 00:00:00 EST 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 19 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

A Framework for the Development and Interpretation of Different Sepsis Definitions and Clinical Criteria
journal, January 2016

  • Angus, Derek C.; Seymour, Christopher W.; Coopersmith, Craig M.
  • Critical Care Medicine, Vol. 44, Issue 3
  • DOI: 10.1097/CCM.0000000000001730

Semi-supervised learning of the electronic health record for phenotype stratification
journal, December 2016


Random Forests
journal, January 2001


mice : Multivariate Imputation by Chained Equations in R
journal, January 2011

  • Buuren, Stef van; Groothuis-Oudshoorn, Karin
  • Journal of Statistical Software, Vol. 45, Issue 3
  • DOI: 10.18637/jss.v045.i03

Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis
journal, February 2015


Risk-Adjusting Hospital Inpatient Mortality Using Automated Inpatient, Outpatient, and Laboratory Databases
journal, January 2008


A New Similarity Index Based on Probability
journal, December 1966


Electronic medical record phenotyping using the anchor and learn framework
journal, April 2016

  • Halpern, Yoni; Horng, Steven; Choi, Youngduck
  • Journal of the American Medical Informatics Association, Vol. 23, Issue 4
  • DOI: 10.1093/jamia/ocw011

Hallmarks of Cancer: The Next Generation
journal, March 2011


A targeted real-time early warning score (TREWScore) for septic shock
journal, August 2015

  • Henry, Katharine E.; Hager, David N.; Pronovost, Peter J.
  • Science Translational Medicine, Vol. 7, Issue 299
  • DOI: 10.1126/scitranslmed.aab3719

Septic Shock Prediction for Patients with Missing Data
journal, April 2014

  • Ho, Joyce C.; Lee, Cheng H.; Ghosh, Joydeep
  • ACM Transactions on Management Information Systems, Vol. 5, Issue 1
  • DOI: 10.1145/2591676

Applied Predictive Modeling
book, January 2013


Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data
journal, June 2013


Hospital Deaths in Patients With Sepsis From 2 Independent Cohorts
journal, July 2014


The Timing of Early Antibiotics and Hospital Mortality in Sepsis
journal, October 2017

  • Liu, Vincent X.; Fielding-Singh, Vikram; Greene, John D.
  • American Journal of Respiratory and Critical Care Medicine, Vol. 196, Issue 7
  • DOI: 10.1164/rccm.201609-1848OC

Cardiovascular oscillations at the bedside: early diagnosis of neonatal sepsis using heart rate characteristics monitoring
journal, October 2011

  • Moorman, J. Randall; Delos, John B.; Flower, Abigail A.
  • Physiological Measurement, Vol. 32, Issue 11
  • DOI: 10.1088/0967-3334/32/11/S08

Distributed EM algorithms for density estimation and clustering in sensor networks
journal, August 2003


Learning probabilistic phenotypes from heterogeneous EHR data
journal, December 2015

  • Pivovarov, Rimma; Perotte, Adler J.; Grave, Edouard
  • Journal of Biomedical Informatics, Vol. 58
  • DOI: 10.1016/j.jbi.2015.10.001

Classification and prognosis of invasive breast cancer: from morphology to molecular taxonomy
journal, May 2010


Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)
journal, February 2016

  • Seymour, Christopher W.; Liu, Vincent X.; Iwashyna, Theodore J.
  • JAMA, Vol. 315, Issue 8
  • DOI: 10.1001/jama.2016.0288

The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)
journal, February 2016

  • Singer, Mervyn; Deutschman, Clifford S.; Seymour, Christopher Warren
  • JAMA, Vol. 315, Issue 8
  • DOI: 10.1001/jama.2016.0287

Dealing with label switching in mixture models
journal, November 2000

  • Stephens, Matthew
  • Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 62, Issue 4
  • DOI: 10.1111/1467-9868.00265

Strategies for Handling Missing Data in Electronic Health Record Derived Data
journal, December 2013

  • Wells, Brian J.; Nowacki, Amy S.; Chagin, Kevin
  • eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 1, Issue 3
  • DOI: 10.13063/2327-9214.1035

Learning Data-Driven Patient Risk Stratification Models for Clostridium difficile
journal, January 2014

  • Wiens, Jenna; Campbell, Wayne N.; Franklin, Ella S.
  • Open Forum Infectious Diseases, Vol. 1, Issue 2
  • DOI: 10.1093/ofid/ofu045

Probable Inference, the Law of Succession, and Statistical Inference
journal, June 1927


Learning probabilistic phenotypes from heterogeneous EHR data
journal, December 2015

  • Pivovarov, Rimma; Perotte, Adler J.; Grave, Edouard
  • Journal of Biomedical Informatics, Vol. 58
  • DOI: 10.1016/j.jbi.2015.10.001

Sepsis: pathophysiology and clinical management
journal, May 2016


A New Similarity Index Based on Probability
journal, December 1966


The K giant stars from the LAMOST survey data I: identification, metallicity, and distance
text, January 2014


Works referencing / citing this record:

Emerging Technologies for Molecular Diagnosis of Sepsis
journal, February 2018

  • Sinha, Mridu; Jupe, Julietta; Mack, Hannah
  • Clinical Microbiology Reviews, Vol. 31, Issue 2
  • DOI: 10.1128/cmr.00089-17