Characterizing Sub-Cohorts via Data Normalization and Representation Learning
Abstract
The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.
- Authors:
-
- ORNL
- Department of Veterans Affairs
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1659604
- DOE Contract Number:
- AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: IEEE International Symposium on Computer Based Medical Systems (CBMS) - Rochester, Minnesota, United States of America - 7/28/2020 4:00:00 AM-7/30/2020 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, and Nebeker, Jonathan R. Characterizing Sub-Cohorts via Data Normalization and Representation Learning. United States: N. p., 2020.
Web.
Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, & Nebeker, Jonathan R. Characterizing Sub-Cohorts via Data Normalization and Representation Learning. United States.
Rush III, Everett, Ozmen, Ozgur, Knight, Kathryn, Park, Byung, Baker, Clifton, Jones, Makoto, Ward, Merry, and Nebeker, Jonathan R. 2020.
"Characterizing Sub-Cohorts via Data Normalization and Representation Learning". United States. https://www.osti.gov/servlets/purl/1659604.
@article{osti_1659604,
title = {Characterizing Sub-Cohorts via Data Normalization and Representation Learning},
author = {Rush III, Everett and Ozmen, Ozgur and Knight, Kathryn and Park, Byung and Baker, Clifton and Jones, Makoto and Ward, Merry and Nebeker, Jonathan R.},
abstractNote = {The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.},
doi = {},
url = {https://www.osti.gov/biblio/1659604},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {7}
}