Characterizing Sub-Cohorts via Data Normalization and Representation Learning
- ORNL
- Department of Veterans Affairs
The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1659604
- Resource Relation:
- Conference: IEEE International Symposium on Computer Based Medical Systems (CBMS) - Rochester, Minnesota, United States of America - 7/28/2020 4:00:00 AM-7/30/2020 4:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Electronic health record analysis via deep poisson factor models
EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records