Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Defending our public biological databases as a global critical infrastructure

Journal Article · · Frontiers in Bioengineering and Biotechnology
 [1];  [2];  [2];  [3];  [4];  [1];  [4];  [4];  [4];  [4];  [1];  [2];  [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. Sandia National Lab. (SNL-CA), Livermore, CA (United States); Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world’s primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. Furthermore this Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity.
Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001; AC04-94AL85000; AC05-76RL01830
OSTI ID:
1499035
Alternate ID(s):
OSTI ID: 1511631
OSTI ID: 1525104
Report Number(s):
LA-UR--19-21685; PNNL-SA--139972; SAND--2019-2221J; 672991
Journal Information:
Frontiers in Bioengineering and Biotechnology, Journal Name: Frontiers in Bioengineering and Biotechnology Vol. 7; ISSN 2296-4185
Publisher:
Frontiers Research FoundationCopyright Statement
Country of Publication:
United States
Language:
English

References (54)

Salinimonas marina sp. nov. Isolated from Jeju Island Marine Sediment journal June 2021
Sphingopyxis jiangsuensis sp. nov. Isolated From the Surface Water of the Yellow Sea journal June 2022
Classifier evaluation and attribute selection against active adversaries journal August 2010
Origin of land plants revisited in the light of sequence contamination and missing data journal August 2012
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning journal July 2015
Predicting effects of noncoding variants with deep learning–based sequence model journal August 2015
Degradation of biological macromolecules supports uncultured microbial populations in Guaymas Basin hydrothermal sediments journal June 2021
Shifting the genomic gold standard for the prokaryotic species definition journal October 2009
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs journal June 2015
Type material in the NCBI Taxonomy Database journal November 2014
Realizing the potential of blockchain technologies in genomics journal August 2018
Prokaryotic taxonomy in the sequencing era - the polyphasic approach revisited: Prokaryotic taxonomy in the sequencing era journal October 2011
Taxonomic Affiliation of New Genomes Should Be Verified Using Average Nucleotide Identity and Multilocus Phylogenetic Analysis journal November 2014
Benchmarking of Methods for Genomic Taxonomy journal February 2014
Private genome analysis through homomorphic encryption journal December 2015
Proof by synthesis of Tobacco mosaic virus journal January 2014
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data journal December 2016
Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions journal March 2017
Meeting report: GenBank microbial genomic taxonomy workshop (12–13 May, 2015) journal February 2016
DFAST and DAGA: web-based integrated genome annotation tools and resources journal January 2016
Removing contaminants from databases of draft genomes journal June 2018
Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18 journal October 2008
Abundant Human DNA Contamination Identified in Non-Primate Genome Databases journal February 2011
Monitoring DNA Contamination in Handled vs. Directly Excavated Ancient Human Skeletal Remains journal January 2013
Strategies to Avoid Wrongly Labelled Genomes Using as Example the Detected Wrong Taxonomic Affiliation for Aeromonas Genomes in the GenBank Database journal January 2015
Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples journal November 2014
ProDeGe: a computational protocol for fully automated decontamination of genomes text January 2016
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection preprint January 2018
Detection of bacterial contaminants and hybrid sequences in the genome of the kelp Saccharina japonica using Taxoblast journal January 2017
Unexpected cross-species contamination in genome sequencing projects journal January 2014
A Large and Consistent Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals journal April 2017
Cyberbiosecurity: From Naive Trust to Risk Awareness journal January 2018
ProDeGe: a computational protocol for fully automated decontamination of genomes journal June 2015
Machine learning in bioinformatics journal March 2006
QUAST: quality assessment tool for genome assemblies journal February 2013
Phylogeny-aware identification and correction of taxonomically mislabeled sequences journal May 2016
Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes journal February 2014
DNA–DNA hybridization values and their relationship to whole-genome sequence similarities journal January 2007
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Hiding clusters in adversarial settings conference June 2008
The Similarity Metric journal December 2004
Benchmarking of Methods for Genomic Taxonomy journal February 2014
Taxonomic Affiliation of New Genomes Should Be Verified Using Average Nucleotide Identity and Multilocus Phylogenetic Analysis journal November 2014
Bioinformatic Genome Comparisons for Taxonomic and Phylogenetic Assignments Using Aeromonas as a Test Case journal November 2014
Adversarial classification conference January 2004
Is data clustering in adversarial settings secure? conference January 2013
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection conference January 2018
Private genome analysis through homomorphic encryption journal December 2015
Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough journal March 2011
Human Contamination in Public Genome Assemblies journal September 2016
Consensus assessment of the contamination level of publicly available cyanobacterial genomes journal July 2018
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data [Supplementary Data] dataset December 2016
Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions collection January 2017
GenomePeek—an online tool for prokaryotic genome and metagenome analysis journal January 2015