DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Identification of mobile genetic elements with geNomad

Journal Article · · Nature Biotechnology

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad’s speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); USDOE Joint Genome Institute (JGI), Berkeley, CA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science (BSS); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC02-05CH11231; 89233218CNA000001; AC05-00OR22725; AC05-76RL01830
OSTI ID:
2280932
Journal Information:
Nature Biotechnology, Journal Name: Nature Biotechnology Vol. 42; ISSN 1087-0156
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English

References (83)

Eukaryotic genomes from a global metagenomic dataset illuminate trophic modes and biogeography of ocean plankton preprint June 2022
PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph journal March 2020
Gradient Centralization: A New Optimization Technique for Deep Neural Networks book January 2020
PlasClass improves plasmid sequence classification journal April 2020
MMSeqs2 virus protein database with VMR_19-250422_MSL37 taxonomy dataset January 2022
Seeker: Alignment-free identification of bacteriophage genomes by deep learning dataset January 2020
geNomad database dataset January 2023
OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs journal November 2018
Expansion of known ssRNA phage genomes: From tens to over a thousand journal February 2020
PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning journal June 2019
Sensitive protein alignments at tree-of-life scale using DIAMOND journal April 2021
ECOD: An Evolutionary Classification of Protein Domains journal December 2014
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses journal November 2018
CDD: NCBI's conserved domain database journal November 2014
RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences journal January 2020
Expansion of the global RNA virome reveals diverse clades of bacteriophages journal October 2022
The TIGRFAMs database of protein families journal January 2003
VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses journal February 2021
efam: an expanded, metaproteome-supported HMM profile database of viral protein families journal June 2021
REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms journal July 2011
Pfam: The protein families database in 2021 journal October 2020
Identifying Conjugative Plasmids and Integrative Conjugative Elements with CONJscan book October 2019
KEGG: Kyoto Encyclopedia of Genes and Genomes journal January 2000
Viruses in the sea journal September 2005
Improved metagenome binning and assembly using deep variational autoencoders journal January 2021
From Louvain to Leiden: guaranteeing well-connected communities journal March 2019
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences journal June 2020
COG database update: focus on microbial diversity, model organisms, and widespread pathogens journal November 2020
A complete domain-to-species taxonomy for Bacteria and Archaea journal April 2020
Recent changes to virus taxonomy ratified by the International Committee on Taxonomy of Viruses (2022) journal August 2022
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment journal December 2011
Three families of Asgard archaeal viruses identified in metagenome-assembled genomes journal June 2022
New candidates for regulated gene integrity revealed through precise mapping of integrative genetic elements journal March 2020
Identifying viruses from metagenomic data using deep learning journal January 2020
Virus Genomes from Deep Sea Sediments Expand the Ocean Megavirome and Support Independent Origins of Viral Gigantism journal March 2019
Plasmid Detection, Characterization, and Ecology journal February 2015
BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes journal July 2021
Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis collection January 2018
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Minimum Information about an Uncultivated Virus Genome (MIUViG) journal December 2018
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences collection January 2020
Bacteriophage Control of Bacterial Virulence journal August 2002
CheckV assesses the quality and completeness of metagenome-assembled viral genomes journal December 2020
Giant viruses with an expanded complement of translation system components journal April 2017
Horizontal gene transfer and the origin of species: lessons from bacteria journal March 2000
Additional data and code for "You can move, but you can't hide: identification of mobile genetic elements with geNomad" dataset January 2023
Metaviral SPAdes : assembly of viruses from metagenomic data journal May 2020
Phigaro: high-throughput prophage sequence annotation journal April 2020
AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence journal June 2021
Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences journal October 2006
Hidden diversity of soil giant viruses journal November 2018
Diversity, evolution, and classification of virophages uncovered through global metagenomics journal December 2019
Giant virus diversity and host interactions through global metagenomics journal January 2020
Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes journal November 2021
Clustering huge protein sequence sets in linear time journal June 2018
Petabase-scale sequence alignment catalyses viral discovery journal January 2022
XGBoost: A Scalable Tree Boosting System conference January 2016
Lateral gene transfer and the nature of bacterial innovation journal May 2000
Deep learning: new computational modelling techniques for genomics journal April 2019
The IMG/M data management and analysis system v.7: content updates and new features journal November 2022
Focal Loss for Dense Object Detection journal February 2020
IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses journal November 2020
A Greedy Algorithm for Aligning DNA Sequences journal February 2000
Phage integration alters the respiratory strategy of its host journal October 2019
hypeR: an R package for geneset enrichment workflows journal September 2019
Seeker: alignment-free identification of bacteriophage genomes by deep learning journal October 2020
Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome journal April 2022
PLSDB: advancing a comprehensive database of bacterial plasmids journal November 2021
GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy journal September 2021
Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis journal April 2018
Systematic evaluation of horizontal gene transfer between eukaryotes and viruses journal December 2021
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata journal November 2022
Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes journal July 2019
Analysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features journal February 2021
The genomes of nucleocytoplasmic large DNA viruses: viral evolution writ large dataset January 2021
TaxonKit: A practical and efficient NCBI taxonomy toolkit journal September 2021
Beyond horizontal gene transfer: the role of plasmids in bacterial evolution journal January 2021
Kalign 3: multiple sequence alignment of large datasets journal October 2019
ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences journal January 2004
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Accelerated Profile HMM Searches journal October 2011
PHROG: families of prokaryotic virus proteins clustered using remote homology journal June 2021
Prophages mediate defense against phage infection through diverse mechanisms journal June 2016