skip to main content


Title: Forecasting the 2013–2014 influenza season using Wikipedia

Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza-like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013-2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, ourmore » forecasting method produced 50% and 95% credible intervals for the 2013-2014 ILI observations that contained the actual observations for most weeks in the forecast. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has passed.« less
 [1] ;  [1] ;  [1] ;  [1] ;  [2] ;  [1] ;  [1] ;  [3]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Tulane Univ., New Orleans, LA (United States)
  3. Pennsylvania State Univ., State College, PA (United States)
Publication Date:
Grant/Contract Number:
Accepted Manuscript
Journal Name:
PLoS Computational Biology (Online)
Additional Journal Information:
Journal Name: PLoS Computational Biology (Online); Journal Volume: 11; Journal Issue: 5; Journal ID: ISSN 1553-7358
Public Library of Science
Research Org:
Sandia National Laboratories (SNL), Albuquerque, NM (United States)
Sponsoring Org:
Country of Publication:
United States
influenza; forecasting; online encyclopedias; seasons; public and occupational health; influenza A virus; natural history of disease; Kalman filter
OSTI Identifier: