skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Abstract

The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.

Authors:
ORCiD logo [1]; ORCiD logo [1];  [1]; ORCiD logo [1];  [2];  [2];  [2];  [2]
  1. ORNL
  2. Department of Veterans Affairs
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1659604
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Symposium on Computer Based Medical Systems (CBMS) - Rochester, Minnesota, United States of America - 7/28/2020 4:00:00 AM-7/30/2020 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, and Nebeker, Jonathan R. Characterizing Sub-Cohorts via Data Normalization and Representation Learning. United States: N. p., 2020. Web.
Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, & Nebeker, Jonathan R. Characterizing Sub-Cohorts via Data Normalization and Representation Learning. United States.
Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, and Nebeker, Jonathan R. 2020. "Characterizing Sub-Cohorts via Data Normalization and Representation Learning". United States. https://www.osti.gov/servlets/purl/1659604.
@article{osti_1659604,
title = {Characterizing Sub-Cohorts via Data Normalization and Representation Learning},
author = {Rush III, Everett and Ozmen, Ozgur and Knight, Kathryn and Park, Byung and Baker, Clifton and Jones, Makoto and Ward, Merry and Nebeker, Jonathan R.},
abstractNote = {The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.},
doi = {},
url = {https://www.osti.gov/biblio/1659604}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {7}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: