Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Conference ·

The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1659604
Country of Publication:
United States
Language:
English

Similar Records

A framework for inferring and analyzing pharmacotherapy treatment patterns
Journal Article · 2024 · BMC Medical Informatics and Decision Making (Online) · OSTI ID:2469817

A Knowledge Network-Based Approach to Facilitate Annotation of Clinical Pathway Component Clusters
Conference · 2021 · OSTI ID:1817487

Ontologizing health systems data at scale: making translational discovery a reality
Journal Article · 2023 · npj Digital Medicine · OSTI ID:2470834

Related Subjects