Estimating influenza incidence using search query deceptiveness and generalized ridge regression
Abstract
Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically- selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additionalmore »
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Univ. of Colorado, Boulder, CO (United States)
- Minnetonka Public Schools, MN (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1597332
- Report Number(s):
- LA-UR-18-24467
Journal ID: ISSN 1553-7358
- Grant/Contract Number:
- 89233218CNA000001; 2016-0595-ECR
- Resource Type:
- Accepted Manuscript
- Journal Name:
- PLoS Computational Biology (Online)
- Additional Journal Information:
- Journal Name: PLoS Computational Biology (Online); Journal Volume: 15; Journal Issue: 10; Journal ID: ISSN 1553-7358
- Publisher:
- Public Library of Science
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; Biological Science; Information Science
Citation Formats
Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, and Osthus, David Allen. Estimating influenza incidence using search query deceptiveness and generalized ridge regression. United States: N. p., 2019.
Web. doi:10.1371/journal.pcbi.1007165.
Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, & Osthus, David Allen. Estimating influenza incidence using search query deceptiveness and generalized ridge regression. United States. https://doi.org/10.1371/journal.pcbi.1007165
Priedhorsky, Reid, Daughton, Ashlynn Rae, Barnard, Martha, O'Connell, Fiona, and Osthus, David Allen. Tue .
"Estimating influenza incidence using search query deceptiveness and generalized ridge regression". United States. https://doi.org/10.1371/journal.pcbi.1007165. https://www.osti.gov/servlets/purl/1597332.
@article{osti_1597332,
title = {Estimating influenza incidence using search query deceptiveness and generalized ridge regression},
author = {Priedhorsky, Reid and Daughton, Ashlynn Rae and Barnard, Martha and O'Connell, Fiona and Osthus, David Allen},
abstractNote = {Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically- selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.},
doi = {10.1371/journal.pcbi.1007165},
journal = {PLoS Computational Biology (Online)},
number = 10,
volume = 15,
place = {United States},
year = {2019},
month = {10}
}
Web of Science
Works referenced in this record:
Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited
journal, February 2019
- Osthus, Dave; Daughton, Ashlynn R.; Priedhorsky, Reid
- PLOS Computational Biology, Vol. 15, Issue 2
The solution path of the generalized lasso
journal, June 2011
- Tibshirani, Ryan J.; Taylor, Jonathan
- The Annals of Statistics, Vol. 39, Issue 3
Detecting influenza epidemics using search engine query data
journal, February 2009
- Ginsberg, Jeremy; Mohebbi, Matthew H.; Patel, Rajan S.
- Nature, Vol. 457, Issue 7232
Timeliness of Nongovernmental versus Governmental Global Outbreak Communications
journal, July 2012
- Mondor, Luke; Brownstein, John S.; Chan, Emily
- Emerging Infectious Diseases, Vol. 18, Issue 7
Evaluation of reporting timeliness of public health surveillance systems for infectious diseases
journal, July 2004
- Jajosky, Ruth Ann; Groseclose, Samuel L.
- BMC Public Health, Vol. 4, Issue 1
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
journal, October 2015
- Santillana, Mauricio; Nguyen, André T.; Dredze, Mark
- PLOS Computational Biology, Vol. 11, Issue 10
An Explicit Solution for Generalized Ridge Regression
journal, August 1975
- Hemmerle, William J.
- Technometrics, Vol. 17, Issue 3
Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
conference, January 2017
- Priedhorsky, Reid; Osthus, Dave; Daughton, Ashlynn R.
- Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions
journal, June 2018
- Brooks, Logan C.; Farrow, David C.; Hyun, Sangwon
- PLOS Computational Biology, Vol. 14, Issue 6
Importance of disease surveillance
journal, December 1974
- Horstmann, Dorothy M.
- Preventive Medicine, Vol. 3, Issue 4
Pathway-Based Genomics Prediction using Generalized Elastic Net
journal, March 2016
- Sokolov, Artem; Carlin, Daniel E.; Paull, Evan O.
- PLOS Computational Biology, Vol. 12, Issue 3
Annual estimates of the burden of seasonal influenza in the United States: A tool for strengthening influenza surveillance and preparedness
journal, January 2018
- Rolfes, Melissa A.; Foppa, Ivo M.; Garg, Shikha
- Influenza and Other Respiratory Viruses, Vol. 12, Issue 1
Validating models for disease detection using twitter
conference, January 2013
- Bodnar, Todd; Salathé, Marcel
- Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion
Ridge Regression: Biased Estimation for Nonorthogonal Problems
journal, February 2000
- Hoerl, Arthur E.; Kennard, Robert W.
- Technometrics, Vol. 42, Issue 1
Evaluation of mechanistic and statistical methods in forecasting influenza-like illness
journal, July 2018
- Kandula, Sasikiran; Yamana, Teresa; Pei, Sen
- Journal of The Royal Society Interface, Vol. 15, Issue 144
Measuring global disease with Wikipedia: Success failure, and a research agenda (Supplemental data)
dataset, January 2016
- Priedhorsky, Reid; Osthus, Dave; Daughton, Ashlynn R.
- figshare
Works referencing / citing this record:
Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited
journal, February 2019
- Osthus, Dave; Daughton, Ashlynn R.; Priedhorsky, Reid
- PLOS Computational Biology, Vol. 15, Issue 2