Integrating multimodal data through interpretable heterogeneous ensembles
- Icahn School of Medicine at Mount Sinai, New York, NY (United States)
- Baylor College of Medicine, Houston, TX (United States)
- National Renewable Energy Lab. (NREL), Golden, CO (United States)
- Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling.
- Research Organization:
- National Renewable Energy Laboratory (NREL), Golden, CO (United States)
- Sponsoring Organization:
- National Institutes of Health (NIH); USDOE
- Grant/Contract Number:
- AC36-08GO28308
- OSTI ID:
- 1898016
- Report Number(s):
- NREL/JA-2700-84525; MainId:85298; UUID:aae3b503-8f10-4591-9705-5947a4338735; MainAdminID:67973
- Journal Information:
- Bioinformatics Advances, Journal Name: Bioinformatics Advances Journal Issue: 1 Vol. 2; ISSN 2635-0041
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Development and validation of a 30-day mortality index based on pre-existing medical administrative data from 13,323 COVID-19 patients: The Veterans Health Administration COVID-19 (VACO) Index
COVID19 Disease Map, a computational knowledge repository of SARS-CoV-2 virus-host interaction mechanisms