skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PRIMED for the Future: Purposing Raw Intake for Machine Learning-Enabled Detection (Final Report)

Technical Report ·
DOI:https://doi.org/10.2172/2204943· OSTI ID:2204943

The COVID-19 pandemic demonstrated how a novel, elusive, and diffuse biological threat can engender uncertainty and misinformation, and it underscored the need for flexible analytical modalities agnostic to the identity of biological material. Yet even before the pandemic, recognition of the limitations of the current, list-based approach, which focuses on known pathogens and biotoxins, and of the importance of agent-agnostic biodetection was growing within the biosecurity community. In a 2018 report on “Biodefense in the Age of Synthetic Biology,” for example, the National Academy of Sciences stated that “an overreliance on the Select Agent List is a systematic weakness affecting many aspects of the United States’ current biodefense mitigation capability”. More recently, a group of biodefense researchers proposed the identification and adoption of “bioagent-agnostic signatures (BASs)” as a way of detecting and characterizing not only existing agents but also novel ones, an approach they believe will “enable a more flexible and resilient biodefense posture”. Indeed, the future of biodetection requires us to begin developing novel analytics that can identify anomalies and/or characteristics that indicate a potential threat, whether known or unknown, without looking for a specific signature that has been identified previously. To assess potential threats more rapidly, it is critical to develop agnostic artificial intelligence (AI)/machine learning (ML) systems that can be employed for real-time assessment of the nature and source of a perturbation. Such systems should be multi scale and multi-dimensional, integrating sensor data from a range of biological, chemical, and physical application spaces. Emerging deep learning (DL) models demonstrate exceptional promise for identification of discriminatory features within multi-dimensional datasets. DL models have the capacity to recognize and encode highly complex patterns in a wide range of input data modalities, including images, text, and biological/chemical/physical spectra. As such, they can execute a wide range of assessments and determinations that have traditionally required a human operator. The promise of advances in DL is apparent in the realm of human health and medicine. DL models have been validated for evaluating a variety of clinical threats to human health in a range of contexts, including infection and cancer, and they demonstrated improved performance in predicting stroke relative to human neurologists in some categories of data. Continuously evolving advances in AI/ML are expected to support more efficient evaluation of raw sequence, spectroscopy, and spectrometry data. For instance, recent advances and deployment of large language models (LLM) such as Generative Pre training Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) have already motivated application of these models for biological function prediction. As frameworks such as LLMs become larger and more complex in their representations, their capacity to serve as pre-trained models that can be fine-tuned for biological/biodetection purposes will similarly be amplified. While existing and emerging AI/ML have found broad applicability and use cases in the clinical sciences, development for environmental evaluation and biodetection has been limited. Functionalizing such capabilities for this purpose requires an understanding of the existing technical landscape and how the respective tools and algorithms are currently being employed. This landscape awareness then allows an assessment of the current practical capabilities of existing models and the anticipated requirements and development efforts that will be needed to adapt available algorithms for biodetection applications relevant to DHS. Leveraging expertise in biodetection, ML, and operational biodetection, the effort described in this report is comprised of a systematic landscape assessment (Subtask 2.1), comparative evaluation (Subtask 2.2), and formulation of a value proposition (Subtask 2.3) for the prospect of ML-enabled, agnostic biodetection from raw, or minimally-processed, datasets.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC52-07NA27344
OSTI ID:
2204943
Report Number(s):
LLNL-TR-856921; 1086351
Country of Publication:
United States
Language:
English