Imputing data that are missing at high rates using a boosting algorithm

Cauthen, Katherine Regina; Lambert, Gregory; Ray, Jaideep; Lefantzi, Sophia

Title: Imputing data that are missing at high rates using a boosting algorithm

Conference · Thu Sep 01 00:00:00 EDT 2016

OSTI ID:1431477

Cauthen, Katherine Regina ^[1]; Lambert, Gregory ^[2]; Ray, Jaideep ^[3]; Lefantzi, Sophia ^[3]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Apple Inc., Cupertino, CA (United States)
Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

View Conference

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 1431477

Report Number(s):: SAND-2016-9430J; 647630

Resource Relation:: Conference: JSM 2016, Chicago, IL (United States), 30 Jul - 4 Aug 2016

Country of Publication:: United States

Language:: English

Similar Records

A General Spatiotemporal Imputation Framework for Missing Sensor Data

Conference · Thu Aug 31 00:00:00 EDT 2023 · OSTI ID:1431477

Tharzeen, Aabila; Munikoti, Sai; Prakash, Punit; +2 more

Spatio-Temporal Denoising Graph Autoencoders with Data Augmentation for Missing Photovoltaic Data Imputation

Conference · Fri Jun 23 00:00:00 EDT 2023 · OSTI ID:1431477

Fan, Yangxin; Yu, Xuanji; Wieser, Raymond J.; +9 more

Integrative analysis of transcriptomic and proteomic data of Shewanella oneidensis: missing value imputation using temporal datasets

Journal Article · Sat Jan 01 00:00:00 EST 2011 · Molecular BioSystems · OSTI ID:1431477

Torres-García, Wandaliz; Brown, Steven D; Johnson, Roger; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
multiple imputation
machine-learning
boosting

Title: Imputing data that are missing at high rates using a boosting algorithm

Citation Formats

Similar Records

Related Subjects