DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Estimating influenza incidence using search query deceptiveness and generalized ridge regression

Journal Article · · PLoS Computational Biology (Online)

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically- selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001; 2016-0595-ECR
OSTI ID:
1597332
Report Number(s):
LA-UR-18-24467
Journal Information:
PLoS Computational Biology (Online), Vol. 15, Issue 10; ISSN 1553-7358
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 6 works
Citation information provided by
Web of Science

References (31)

Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited journal February 2019
The solution path of the generalized lasso journal June 2011
Detecting influenza epidemics using search engine query data journal February 2009
Timeliness of Nongovernmental versus Governmental Global Outbreak Communications journal July 2012
Evaluation of reporting timeliness of public health surveillance systems for infectious diseases journal July 2004
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance journal October 2015
An Explicit Solution for Generalized Ridge Regression journal August 1975
Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
  • Priedhorsky, Reid; Osthus, Dave; Daughton, Ashlynn R.
  • Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17 https://doi.org/10.1145/2998181.2998183
conference January 2017
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions journal June 2018
Importance of disease surveillance journal December 1974
Pathway-Based Genomics Prediction using Generalized Elastic Net journal March 2016
Annual estimates of the burden of seasonal influenza in the United States: A tool for strengthening influenza surveillance and preparedness journal January 2018
An Introduction to Statistical Learning book January 2013
Validating models for disease detection using twitter conference January 2013
Ridge Regression: Biased Estimation for Nonorthogonal Problems journal February 2000
Evaluation of mechanistic and statistical methods in forecasting influenza-like illness journal July 2018
Measuring global disease with Wikipedia: Success failure, and a research agenda (Supplemental data) dataset January 2016
Measuring global disease with Wikipedia: Success failure, and a research agenda (Supplemental data) dataset January 2016
Importance of disease surveillance journal December 1974
Detecting influenza epidemics using search engine query data journal February 2009
Ridge Regression: Biased Estimation for Nonorthogonal Problems journal February 1970
Annual estimates of the burden of seasonal influenza in the United States: A tool for strengthening influenza surveillance and preparedness journal January 2018
Evaluation of reporting timeliness of public health surveillance systems for infectious diseases journal July 2004
Comparing timeliness, content, and disease severity of formal and informal source outbreak reporting journal March 2015
Detecting signals of seasonal influenza severity through age dynamics journal December 2015
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance journal October 2015
Pathway-Based Genomics Prediction using Generalized Elastic Net journal March 2016
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions journal June 2018
Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis journal January 2018
Timeliness of Nongovernmental versus Governmental Global Outbreak Communications journal July 2012
Evaluation of mechanistic and statistical methods in forecasting influenza-like illness text January 2018

Cited By (1)

Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited journal February 2019

Similar Records

Forecasting the 2013–2014 influenza season using Wikipedia
Journal Article · 2015 · PLoS Computational Biology (Online) · OSTI ID:1214725

Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S.
Journal Article · 2019 · PLoS Computational Biology (Online) · OSTI ID:1604056

Global disease monitoring and forecasting with Wikipedia
Journal Article · 2014 · PLoS Computational Biology (Online) · OSTI ID:1214710