skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Estimating influenza incidence using search query deceptiveness and generalized ridge regression

Abstract

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically- selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additionalmore » data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.« less

Authors:
ORCiD logo [1]; ORCiD logo [2];  [3];  [3]; ORCiD logo [1]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Univ. of Colorado, Boulder, CO (United States)
  3. Minnetonka Public Schools, MN (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA)
OSTI Identifier:
1597332
Report Number(s):
LA-UR-18-24467
Journal ID: ISSN 1553-7358
Grant/Contract Number:  
89233218CNA000001; 2016-0595-ECR
Resource Type:
Accepted Manuscript
Journal Name:
PLoS Computational Biology (Online)
Additional Journal Information:
Journal Name: PLoS Computational Biology (Online); Journal Volume: 15; Journal Issue: 10; Journal ID: ISSN 1553-7358
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Biological Science; Information Science

Citation Formats

Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, and Osthus, David Allen. Estimating influenza incidence using search query deceptiveness and generalized ridge regression. United States: N. p., 2019. Web. doi:10.1371/journal.pcbi.1007165.
Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, & Osthus, David Allen. Estimating influenza incidence using search query deceptiveness and generalized ridge regression. United States. doi:10.1371/journal.pcbi.1007165.
Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, and Osthus, David Allen. Tue . "Estimating influenza incidence using search query deceptiveness and generalized ridge regression". United States. doi:10.1371/journal.pcbi.1007165. https://www.osti.gov/servlets/purl/1597332.
@article{osti_1597332,
title = {Estimating influenza incidence using search query deceptiveness and generalized ridge regression},
author = {Priedhorsky, Reid and Daughton, Ashlynn Rae and Barnard, Martha and O'Connell, Fiona and Osthus, David Allen},
abstractNote = {Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically- selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.},
doi = {10.1371/journal.pcbi.1007165},
journal = {PLoS Computational Biology (Online)},
number = 10,
volume = 15,
place = {United States},
year = {2019},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited
journal, February 2019


The solution path of the generalized lasso
journal, June 2011

  • Tibshirani, Ryan J.; Taylor, Jonathan
  • The Annals of Statistics, Vol. 39, Issue 3
  • DOI: 10.1214/11-AOS878

Detecting influenza epidemics using search engine query data
journal, February 2009

  • Ginsberg, Jeremy; Mohebbi, Matthew H.; Patel, Rajan S.
  • Nature, Vol. 457, Issue 7232
  • DOI: 10.1038/nature07634

Timeliness of Nongovernmental versus Governmental Global Outbreak Communications
journal, July 2012

  • Mondor, Luke; Brownstein, John S.; Chan, Emily
  • Emerging Infectious Diseases, Vol. 18, Issue 7
  • DOI: 10.3201/eid1807.120249

Evaluation of reporting timeliness of public health surveillance systems for infectious diseases
journal, July 2004


Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
journal, October 2015


An Explicit Solution for Generalized Ridge Regression
journal, August 1975


Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
conference, January 2017

  • Priedhorsky, Reid; Osthus, Dave; Daughton, Ashlynn R.
  • Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17
  • DOI: 10.1145/2998181.2998183

Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions
journal, June 2018


Importance of disease surveillance
journal, December 1974


Pathway-Based Genomics Prediction using Generalized Elastic Net
journal, March 2016


Annual estimates of the burden of seasonal influenza in the United States: A tool for strengthening influenza surveillance and preparedness
journal, January 2018

  • Rolfes, Melissa A.; Foppa, Ivo M.; Garg, Shikha
  • Influenza and Other Respiratory Viruses, Vol. 12, Issue 1
  • DOI: 10.1111/irv.12486

Validating models for disease detection using twitter
conference, January 2013

  • Bodnar, Todd; Salathé, Marcel
  • Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion
  • DOI: 10.1145/2487788.2488027

Ridge Regression: Biased Estimation for Nonorthogonal Problems
journal, February 2000


Evaluation of mechanistic and statistical methods in forecasting influenza-like illness
journal, July 2018

  • Kandula, Sasikiran; Yamana, Teresa; Pei, Sen
  • Journal of The Royal Society Interface, Vol. 15, Issue 144
  • DOI: 10.1098/rsif.2018.0174

    Works referencing / citing this record:

    Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited
    journal, February 2019