skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Imputing data that are missing at high rates using a boosting algorithm

Conference ·
OSTI ID:1431477
 [1];  [2];  [3];  [3]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Apple Inc., Cupertino, CA (United States)
  3. Sandia National Lab. (SNL-CA), Livermore, CA (United States)

Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia National Lab. (SNL-CA), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1431477
Report Number(s):
SAND-2016-9430J; 647630
Resource Relation:
Conference: JSM 2016, Chicago, IL (United States), 30 Jul - 4 Aug 2016
Country of Publication:
United States
Language:
English