Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Text-based Analytics for Biosurveillance

Book ·

The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when). The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when).

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1440619
Report Number(s):
PNNL-SA-126938
Country of Publication:
United States
Language:
English

Similar Records

Machine Learning for Identifying Relevance to Biosurveillance in Multilingual Text
Journal Article · Tue May 22 00:00:00 EDT 2018 · Online Journal of Public Health Informatics · OSTI ID:1629052

An Overview of Internet biosurveillance
Journal Article · Fri Jun 21 00:00:00 EDT 2013 · Clinical Microbiology and Infection, 19(11):1006-1013 · OSTI ID:1132700

The Landscape of International Biosurveillance
Journal Article · Sun Jan 31 23:00:00 EST 2010 · Emerging Health Threats Journal, 3:e3 · OSTI ID:992356