Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Forecasting the 2013–2014 influenza season using Wikipedia

Journal Article · · PLoS Computational Biology (Online)
 [1];  [1];  [1];  [1];  [2];  [1];  [1];  [3]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Tulane Univ., New Orleans, LA (United States)
  3. Pennsylvania State Univ., State College, PA (United States)

Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza-like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013-2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method produced 50% and 95% credible intervals for the 2013-2014 ILI observations that contained the actual observations for most weeks in the forecast. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has passed.

Research Organization:
Sandia National Laboratories (SNL), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1214725
Journal Information:
PLoS Computational Biology (Online), Journal Name: PLoS Computational Biology (Online) Journal Issue: 5 Vol. 11; ISSN 1553-7358
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English

References (56)

Semi-empirical power-law scaling of new infection rate to model epidemic dynamics with inhomogeneous mixing journal October 2006
Comparison of the performance of particle filter algorithms applied to tracking of a disease epidemic journal September 2014
A computer simulation of vaccine prioritization, allocation, and rationing during the 2009 H1N1 influenza pandemic journal July 2010
Using the Kalman filter and dynamic models to assess the changing HIV/AIDS epidemic journal March 1997
Real-time influenza forecasts during the 2012–2013 season journal December 2013
Mitigation strategies for pandemic influenza in the United States journal April 2006
Forecasting seasonal outbreaks of influenza journal November 2012
A Survey of Sequential Monte Carlo Methods for Economics and Finance journal May 2012
A deterministic model for influenza infection with multiple strains and antigenic drift journal December 2013
Parameter estimation for stiff deterministic dynamical systems via ensemble Kalman filter journal September 2014
A systematic review of studies on forecasting the dynamics of influenza outbreaks journal December 2013
Seasonal and pandemic influenza surveillance considerations for constructing multicomponent systems journal March 2009
Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings journal May 2009
10. Modeling the Spread of Influenza among Cities book January 2003
The Mathematics of Infectious Diseases journal January 2000
MapReduce: simplified data processing on large clusters journal January 2008
A New Heuristic Optimization Algorithm: Harmony Search journal February 2001
A Dirichlet process model for classifying and forecasting epidemic curves journal January 2014
The Ratio of Emergency Department Visits for ILI to Seroprevalence of 2009 Pandemic Influenza A (H1N1) Virus Infection, Florida, 2009 journal January 2014
Forecasting Peaks of Seasonal Influenza Epidemics journal January 2013
Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics journal April 2014
Real-Time Epidemic Monitoring and Forecasting of H1N1-2009 Using Influenza-Like Illness from General Practice and Family Doctor Clinics in Singapore journal April 2010
A Simulation Optimization Approach to Epidemic Forecasting journal June 2013
Influenza Forecasting in Human Populations: A Scoping Review journal April 2014
Prediction of an Epidemic Curve: A Supervised Classification Approach journal January 2011
Data Driven Computing by the Morphing Fast Fourier Transform Ensemble Kalman Filter in Epidemic Spread Simulations preprint January 2010
Monitoring and prediction of an epidemic outbreak using syndromic observations preprint January 2011
Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter preprint January 2005
Global Disease Monitoring and Forecasting with Wikipedia journal March 2016
Comparison of sequential data assimilation methods for the Kuramoto-Sivashinsky equation journal January 2009
Data Assimilation: A Mathematical Introduction book January 2015
Data Assimilation book January 2009
Using the Kalman filter and dynamic models to assess the changing HIV/AIDS epidemic journal March 1997
Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong journal May 2003
Variational data assimilation with epidemic models journal June 2009
Monitoring and prediction of an epidemic outbreak using syndromic observations journal November 2012
Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter journal June 2007
Data driven computing by the morphing fast Fourier transform ensemble Kalman filter in epidemic spread simulations journal May 2010
Mixing patterns between age groups in social networks journal October 2007
Bayesian tracking of emerging epidemics using ensemble optimal statistical interpolation journal July 2014
Tracking the flu pandemic by monitoring the social web conference June 2010
Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions conference April 2014
Data Assimilation book January 2016
The Mathematics of Infectious Diseases journal January 2000
Inferring the origin locations of tweets with quantitative confidence
  • Priedhorsky, Reid; Culotta, Aron; Del Valle, Sara Y.
  • Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing - CSCW '14 https://doi.org/10.1145/2531602.2531607
conference January 2014
An Ensemble Kalman Smoother for Nonlinear Dynamics journal June 2000
Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility journal September 2009
Time series analysis via mechanistic models journal March 2009
Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time journal April 2014
Global Disease Monitoring and Forecasting with Wikipedia journal November 2014
Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases journal May 2008
Human Mobility Networks, Travel Restrictions, and the Global Spread of 2009 H1N1 Pandemic journal January 2011
Inferring the Origin Locations of Tweets with Quantitative Confidence text January 2013
Results from the centers for disease control and prevention’s predict the 2013–2014 Influenza Season Challenge collection January 2016
Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics text January 2014
Influenza Forecasting in Human Populations: A Scoping Review text January 2014

Cited By (59)

Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China journal September 2019
A statistical tool for comparing seasonal ILI surveillance data journal February 2019
Accurate regional influenza epidemics tracking using Internet search data journal March 2019
Revisiting the use of web search data for stock market movements journal September 2019
Dynamics and biases of online attention: the case of aircraft crashes journal October 2016
Influenza forecast optimization when using different surveillance data types and geographic scale journal August 2018
A Comparative Study on the Prediction of Occupational Diseases in China with Hybrid Algorithm Combing Models journal September 2019
Results from the centers for disease control and prevention’s predict the 2013–2014 Influenza Season Challenge journal July 2016
Using electronic health records and Internet search information for accurate influenza forecasting journal May 2017
Early and Real-Time Detection of Seasonal Influenza Onset journal February 2017
What to know before forecasting the flu journal October 2018
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions journal June 2018
Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited journal February 2019
Development and validation of influenza forecasting for 64 temperate and tropical countries journal February 2019
Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data journal November 2019
Forecasting influenza in Europe using a metapopulation model incorporating cross-border commuting and air travel journal October 2020
The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks journal December 2015
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion journal May 2017
Public reaction to Chikungunya outbreaks in Italy—Insights from an extensive novel data streams-based structural equation modeling analysis journal May 2018
Forecasting type-specific seasonal influenza after 26 weeks in the United States using influenza activities in other countries journal November 2019
Pharmacy students can improve access to quality medicines information by editing Wikipedia articles text January 2018
The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review
  • Barros, Joana M.; Duggan, Jim; Rebholz-Schuhmann, Dietrich
  • Journal of Medical Internet Research, Vol. 22, Issue 3 https://doi.org/10.2196/13680
journal January 2020
The dynamic of information-driven coordination phenomena: a transfer entropy analysis text January 2015
Memory Remains: Understanding Collective Memory in the Digital Age text January 2016
Epidemiological data challenges: planning for a more robust future through data standards text January 2018
Forecasting Based on Surveillance Data text January 2018
Predicting the Flu from Instagram preprint January 2018
Wikipedia: a tool to monitor seasonal diseases trends? journal May 2017
Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning, from 2011 to 2015 journal June 2018
Evaluation of mechanistic and statistical methods in forecasting influenza-like illness text January 2018
Infectious disease prediction with kernel conditional density estimation: Infectious disease prediction with kernel conditional density estimation journal September 2017
Assessing the Use of Influenza Forecasts and Epidemiological Modeling in Public Health Decision Making in the United States journal August 2018
A Smartphone-Driven Thermometer Application for Real-time Population- and Individual-Level Influenza Surveillance journal February 2018
Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems journal November 2016
Use of daily Internet search query data improves real-time projections of influenza epidemics journal October 2018
Improved real-time influenza surveillance using Internet search data in eight Latin American countries journal September 2018
Forecasting type-specific seasonal influenza after 26 weeks in the United States using influenza activities in other countries journal July 2019
Epidemic forecasts as a tool for public health: interpretation and (re)calibration journal December 2017
The dynamics of information-driven coordination phenomena: A transfer entropy analysis journal April 2016
Wikipedia traffic data and electoral prediction: towards theoretically informed models journal June 2016
In search of art: rapid estimates of gallery and museum visits using Google Trends journal June 2020
Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
  • Priedhorsky, Reid; Osthus, Dave; Daughton, Ashlynn R.
  • Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17 https://doi.org/10.1145/2998181.2998183
conference January 2017
Using Participatory Web-based Surveillance Data to Improve Seasonal Influenza Forecasting in Italy
  • Perrotta, Daniela; Tizzoni, Michele; Paolotti, Daniela
  • WWW '17: 26th International World Wide Web Conference, Proceedings of the 26th International Conference on World Wide Web https://doi.org/10.1145/3038912.3052670
conference April 2017
Accurate quantification of uncertainty in epidemic parameter estimates and predictions using stochastic compartmental models journal November 2018
Pharmacy students can improve access to quality medicines information by editing Wikipedia articles journal November 2018
A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria journal January 2020
Influenza Altmetric Attention Score and its association with the influenza season in USA journal January 2020
A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation journal March 2019
Prediction of infectious disease epidemics via weighted density ensembles journal February 2018
Real Time Influenza Monitoring Using Hospital Big Data in Combination with Machine Learning Methods: Comparison Study journal January 2018
Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries journal January 2019
Subregional Nowcasts of Seasonal Influenza Using Search Trends journal January 2017
Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis journal January 2016
Combining Participatory Influenza Surveillance with Modeling and Forecasting: Three Alternative Approaches journal January 2017
Observational Needs for Improving Ocean and Coupled Reanalysis, S2S Prediction, and Decadal Prediction journal July 2019
Prediction of infectious disease epidemics via weighted density ensembles text January 2017
Development and validation of influenza forecasting for 64 temperate and tropical countries text January 2019
Assessing the Use of Influenza Forecasts and Epidemiological Modeling in Public Health Decision Making in the United States text January 2018
Subregional Nowcasts of Seasonal Influenza Using Search Trends text January 2017