DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Comparison of Infectious Disease Forecasting Methods across Locations, Diseases, and Time

Journal Article · · Pathogens
 [1]; ORCiD logo [2];  [1];  [1]; ORCiD logo [3]; ORCiD logo [2]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Washington State Univ., Pullman, WA (United States)
  3. Pacific Northwest National Lab. (PNNL), Richland, WA (United States); North Carolina State Univ., Raleigh, NC (United States)

Accurate infectious disease forecasting can inform efforts to prevent outbreaks and mitigate adverse impacts. This study compares the performance of statistical, machine learning (ML), and deep learning (DL) approaches in forecasting infectious disease incidences across different countries and time intervals. We forecasted three diverse diseases: campylobacteriosis, typhoid, and Q-fever, using a wide variety of features (n = 46) from public datasets, e.g., landscape, climate, and socioeconomic factors. We compared autoregressive statistical models to two tree-based ML models (extreme gradient boosted trees [XGB] and random forest [RF]) and two DL models (multi-layer perceptron and encoder–decoder model). The disease models were trained on data from seven different countries at the region-level between 2009–2017. Forecasting performance of all models was assessed using mean absolute error, root mean square error, and Poisson deviance across Australia, Israel, and the United States for the months of January through August of 2018. The overall model results were compared across diseases as well as various data splits, including country, regions with highest and lowest cases, and the forecasted months out (i.e., nowcasting, short-term, and long-term forecasting). Overall, the XGB models performed the best for all diseases and, in general, tree-based ML models performed the best when looking at data splits. There were a few instances where the statistical or DL models had minutely smaller error metrics for specific subsets of typhoid, which is a disease with very low case counts. Feature importance per disease was measured by using four tree-based ML models (i.e., XGB and RF with and without region name as a feature). The most important feature groups included previous case counts, region name, population counts and density, mortality causes of neonatal to under 5 years of age, sanitation factors, and elevation. This study demonstrates the power of ML approaches to incorporate a wide range of factors to forecast various diseases, regardless of location, more accurately than traditional statistical approaches.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1842680
Report Number(s):
PNNL-SA-169380
Journal Information:
Pathogens, Journal Name: Pathogens Journal Issue: 2 Vol. 11; ISSN 2076-0817
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (29)

Interpretability of machine learning‐based prediction models in healthcare journal June 2020
Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data journal December 2000
Digital disease detection: A systematic review of event-based internet biosurveillance systems journal May 2017
Estimation of energy consumption in machine learning journal December 2019
Identification of risk factors associated with Coxiella burnetii infection in cattle and buffaloes in India journal August 2020
Machine learning methods for solar radiation forecasting: A review journal May 2017
Prevalence of Coxiella burnetii infection in domestic ruminants: A critical review journal April 2011
Random Forests journal January 2001
Real-time Epidemic Forecasting: Challenges and Opportunities journal August 2019
On the optimal number of hidden nodes in a neural network conference January 1998
An overview of Internet biosurveillance journal November 2013
A penalized framework for distributed lag non-linear models: Penalized DLNMs journal January 2017
Global Epidemiology of Campylobacter Infection journal June 2015
Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study journal December 2020
Interpretable Deep Learning for Spatial Analysis of Severe Hailstorms journal August 2019
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods journal January 2019
Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks journal August 2014
Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China journal March 2020
Q fever is an old and neglected zoonotic disease in Kenya: a systematic review journal April 2016
Survey on categorical data for neural networks journal April 2020
Mapping dengue risk in Singapore using Random Forest journal June 2018
Statistical and Machine Learning forecasting methods: Concerns and ways forward journal March 2018
A Progress Report On Electronic Health Records In U.S. Hospitals journal October 2010
Small, Nonteaching, And Rural Hospitals Continue To Be Slow In Adopting Electronic Health Record Systems journal May 2012
Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study journal January 2020
Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance journal January 2005
Infectious Disease Threats in the Twenty-First Century: Strengthening the Global Response journal March 2019
Optimal Deep Learning LSTM Model for Electric Load Forecasting using Feature Selection and Genetic Algorithm: Comparison with Machine Learning Approaches † journal June 2018
Epi Archive: Automated Synthesis of Global Notifiable Disease Data journal May 2019