skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes

Journal Article · · Computational and Structural Biotechnology Journal

Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1831698
Journal Information:
Computational and Structural Biotechnology Journal, Vol. 19, Issue na; ISSN 2001-0370
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (50)

Does Rubella Cause Autism: A 2015 Reappraisal? journal February 2016
Redondoviridae, a Family of Small, Circular DNA Viruses of the Human Oro-Respiratory Tract Associated with Periodontitis and Critical Illness journal August 2019
Dynamic Modulation of the Gut Microbiota and Metabolome by Bacteriophages in a Mouse Model journal June 2019
Improved metagenomic analysis with Kraken 2 journal November 2019
Evaluation of a concatenated protein phylogeny for classification of tailed double-stranded DNA viruses belonging to the order Caudovirales journal May 2019
STAR: ultrafast universal RNA-seq aligner journal October 2012
Viruses as Winners in the Game of Life journal September 2016
Responses of In vitro-Grown Plantlets (Vitis vinifera) to Grapevine leafroll-Associated Virus-3 and PEG-Induced Drought Stress journal June 2016
Prenatal and Perinatal Risk Factors for Autism in China journal April 2010
Virus taxonomy in the age of metagenomics journal January 2017
Shared and unique responses of plants to multiple individual stresses and stress combinations: physiological and molecular mechanisms journal September 2015
The interactive effects of simultaneous biotic and abiotic stresses on plants: Mechanistic understanding from drought and pathogen combination journal March 2015
Peripheral Nervous System Manifestations of Infectious Diseases journal June 2014
Association of Human Immunodeficiency Virus Infection and Risk of Peripheral Artery Disease journal July 2018
Analyses of seven new whole genome sequences of cassava brown streak viruses in Mozambique reveals two distinct clades: evidence for new species journal March 2019
Nontargeted virus sequence discovery pipeline and virus clustering for metagenomic data journal July 2017
Fold change rank ordering statistics: a new method for detecting differentially expressed genes journal January 2014
Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads journal June 2014
Real Time Classification of Viruses in 12 Dimensions journal May 2013
IMG: the integrated microbial genomes database and comparative analysis system journal December 2011
Peach RNA viromes in six different peach cultivars journal January 2018
Association Between the Respiratory Microbiome and Susceptibility to Influenza Virus Infection journal September 2019
The Virome of Cerebrospinal Fluid: Viruses Where We Once Thought There Were None journal September 2019
Partitioning the Genetic Diversity of a Virus Family: Approach and Evaluation through a Case Study of Picornaviruses journal January 2012
Kraken: ultrafast metagenomic sequence classification using exact alignments journal January 2014
Single-cell genomics identifies cell type–specific molecular changes in autism journal May 2019
Destabilization of the gut microbiome marks the end-stage of simian immunodeficiency virus infection in wild chimpanzees: Impact of SIVcpz on the Gut Microbiome journal December 2015
Autistic disorder and viral infections journal February 2005
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses journal October 2016
Uncovering Earth’s virome journal August 2016
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks journal January 2018
ViromeScan: a new tool for metagenomic viral community profiling journal March 2016
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation journal November 2015
Autism After Infection, Febrile Episodes, and Antibiotic Use During Pregnancy: An Exploratory Study journal November 2012
Phytobiome and Transcriptional Adaptation of Populus deltoides to Acute Progressive Drought and Cyclic Drought journal January 2018
A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications journal March 2011
Metavir 2: new tools for viral metagenome comparison and assembled virome analysis journal January 2014
Phytovirome Analysis of Wild Plant Populations: Comparison of Double-Stranded RNA and Virion-Associated Nucleic Acid Metagenomic Approaches journal December 2019
Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks journal November 2003
Association Between Hepatitis C Virus and Chronic Kidney Disease: A Systematic Review and Meta-Analysis journal May 2018
Changes to virus taxonomy and the International Code of Virus Classification and Nomenclature ratified by the International Committee on Taxonomy of Viruses (2019) journal June 2019
Common infections with polyomaviruses and herpesviruses and neuropsychological development at 4 years of age, the Rhea birth cohort in Crete, Greece journal June 2016
Atropos: specific, sensitive, and speedy trimming of sequencing reads journal January 2017
An efficient algorithm for large-scale detection of protein families journal April 2002
The Role of the Immune System in Autism Spectrum Disorder journal August 2016
Methods for virus classification and the challenge of incorporating metagenomic sequence data journal June 2015
Association of autism with polyomavirus infection in postmortem brains journal April 2010
Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification journal August 2014
Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis journal July 2016
Kraken2 Metagenomic Virus Database
  • Garcia, Benjamin; Simha, Ramanuja; Garvin, Michael
  • Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States) https://doi.org/10.13139/olcf/1615774
dataset January 2020