Text-based Analytics for Biosurveillance
The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when). The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when).
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1440619
- Report Number(s):
- PNNL-SA-126938
- Resource Relation:
- Related Information: Advanced Data Analytics in Health, 93:117-131
- Country of Publication:
- United States
- Language:
- English
Similar Records
Machine Learning for Identifying Relevance to Biosurveillance in Multilingual Text
Epi Archive: automated data collection of notifiable disease data