skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity

Abstract

Background: Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results: Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to definemore » populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions: These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less

Authors:
 [1];  [1];  [2];  [3]
  1. The Ohio State Univ., Columbus, OH (United States). Department of Microbiology
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. The Ohio State Univ., Columbus, OH (United States). Department of Microbiology and Department of Civil, Environmental and Geodetic Engineering
Publication Date:
Research Org.:
Univ. of Arizona, Tucson, AZ (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Biological Systems Science Division
OSTI Identifier:
1424953
Alternate Identifier(s):
OSTI ID: 1581051
Grant/Contract Number:  
SC0010580; SC0016440; AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
PeerJ
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2167-8359
Publisher:
PeerJ Inc.
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Bioinformatics; Ecology; Genomics; Microbiology

Citation Formats

Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., and Sullivan, Matthew B. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. United States: N. p., 2017. Web. doi:10.7717/peerj.3817.
Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., & Sullivan, Matthew B. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. United States. doi:10.7717/peerj.3817.
Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., and Sullivan, Matthew B. Thu . "Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity". United States. doi:10.7717/peerj.3817. https://www.osti.gov/servlets/purl/1424953.
@article{osti_1424953,
title = {Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity},
author = {Roux, Simon and Emerson, Joanne B. and Eloe-Fadrosh, Emiley A. and Sullivan, Matthew B.},
abstractNote = {Background: Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results: Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions: These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.},
doi = {10.7717/peerj.3817},
journal = {PeerJ},
number = ,
volume = 5,
place = {United States},
year = {2017},
month = {9}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 16 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Rising to the challenge: accelerated pace of discovery transforms marine virology
journal, February 2015

  • Brum, Jennifer R.; Sullivan, Matthew B.
  • Nature Reviews Microbiology, Vol. 13, Issue 3
  • DOI: 10.1038/nrmicro3404

The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes
journal, December 2009


Utilization of defined microbial communities enables effective evaluation of meta-genomic assemblies
journal, April 2017


Recombination and microdiversity in coastal marine cyanophages
journal, November 2009


Utilizing novel diversity estimators to quantify multiple dimensions of microbial biodiversity across domains
journal, January 2013

  • Doll, Hannah M.; Armitage, David W.; Daly, Rebecca A.
  • BMC Microbiology, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2180-13-259

Assembly of Viral Metagenomes from Yellowstone Hot Springs
journal, April 2008

  • Schoenfeld, T.; Patterson, M.; Richardson, P. M.
  • Applied and Environmental Microbiology, Vol. 74, Issue 13
  • DOI: 10.1128/AEM.02598-07

Accurate, multi-kb reads resolve complex populations and detect rare microorganisms
journal, February 2015

  • Sharon, Itai; Kertesz, Michael; Hug, Laura A.
  • Genome Research, Vol. 25, Issue 4
  • DOI: 10.1101/gr.183012.114

Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome
journal, August 2014

  • Hurwitz, Bonnie L.; Brum, Jennifer R.; Sullivan, Matthew B.
  • The ISME Journal, Vol. 9, Issue 2
  • DOI: 10.1038/ismej.2014.143

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
journal, August 2017

  • Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
  • Nature Biotechnology, Vol. 35, Issue 8
  • DOI: 10.1038/nbt.3893

Evaluation of viral genome assembly and diversity estimation in deep metagenomes
journal, January 2014

  • Aguirre de Cárcer, Daniel; Angly, Florent E.; Alcamí, Antonio
  • BMC Genomics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2164-15-989

Seasonality and monthly dynamics of marine myovirus communities: Marine myovirus community dynamics at SPOT
journal, April 2012


Recovering complete and draft population genomes from metagenome datasets
journal, March 2016


Viruses as Winners in the Game of Life
journal, September 2016


Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization
journal, August 2012

  • Sharon, I.; Morowitz, M. J.; Thomas, B. C.
  • Genome Research, Vol. 23, Issue 1
  • DOI: 10.1101/gr.142315.112

Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
journal, November 2016

  • Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar
  • BMC Genomics, Vol. 17, Issue 1
  • DOI: 10.1186/s12864-016-3286-x

Fermentation, Hydrogen, and Sulfur Metabolism in Multiple Uncultivated Bacterial Phyla
journal, September 2012


Differential expression analysis for sequence count data
journal, October 2010


Using ecological diversity measures with bacterial communities
journal, February 2003


Challenges in the analysis of viral metagenomes
journal, July 2016

  • Rose, Rebecca; Constantinides, Bede; Tapinos, Avraam
  • Virus Evolution, Vol. 2, Issue 2
  • DOI: 10.1093/ve/vew022

Trimmomatic: a flexible trimmer for Illumina sequence data
journal, April 2014


Patterns and ecological drivers of ocean viral communities
journal, May 2015


Probing Individual Environmental Bacteria for Viruses by Using Microfluidic Digital PCR
journal, June 2011


Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
journal, April 2014


The khmer software package: enabling efficient nucleotide sequence analysis
journal, January 2015


Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!
journal, January 2017


Where Next for Microbiome Research?
journal, January 2015


Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
journal, September 2017

  • White, Daniel J.; Wang, Jing; Hall, Richard J.
  • Journal of Computational Biology, Vol. 24, Issue 9
  • DOI: 10.1089/cmb.2017.0008

Towards quantitative viromics for both double-stranded and single-stranded DNA viruses
journal, January 2016


Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes
journal, May 2015


metaSPAdes: a new versatile metagenomic assembler
journal, March 2017

  • Nurk, Sergey; Meleshko, Dmitry; Korobeynikov, Anton
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.213959.116

Development of phoH as a Novel Signature Gene for Assessing Marine Phage Diversity
journal, September 2011

  • Goldsmith, Dawn B.; Crosti, Giuseppe; Dwivedi, Bhakti
  • Applied and Environmental Microbiology, Vol. 77, Issue 21
  • DOI: 10.1128/AEM.05531-11

Single-virus genomics reveals hidden cosmopolitan and abundant viruses
journal, June 2017

  • Martinez-Hernandez, Francisco; Fornas, Oscar; Lluesma Gomez, Monica
  • Nature Communications, Vol. 8, Issue 1
  • DOI: 10.1038/ncomms15892

Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
journal, February 2012


MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads
journal, July 2012

  • Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki
  • Nucleic Acids Research, Vol. 40, Issue 20
  • DOI: 10.1093/nar/gks678

Pyrosequencing enumerates and contrasts soil microbial diversity
journal, July 2007

  • Roesch, Luiz F. W.; Fulthorpe, Roberta R.; Riva, Alberto
  • The ISME Journal, Vol. 1, Issue 4
  • DOI: 10.1038/ismej.2007.53

Are we missing half of the viruses in the ocean?
journal, November 2012

  • Steward, Grieg F.; Culley, Alexander I.; Mueller, Jaclyn A.
  • The ISME Journal, Vol. 7, Issue 3
  • DOI: 10.1038/ismej.2012.121

Assessing the Diversity and Specificity of Two Freshwater Viral Communities through Metagenomics
journal, March 2012


CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
journal, May 2015

  • Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.
  • Genome Research, Vol. 25, Issue 7
  • DOI: 10.1101/gr.186072.114

Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
journal, April 2007

  • Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerrie
  • Nature Methods, Vol. 4, Issue 6
  • DOI: 10.1038/NMETH1043

Metabolic reprogramming by viruses in the sunlit and dark ocean
journal, January 2013


Using MUMmer to Identify Similar Regions in Large Sequence Sets
journal, January 2003

  • Delcher, Arthur L.; Salzberg, Steven L.; Phillippy, Adam M.
  • Current Protocols in Bioinformatics, Vol. 00, Issue 1
  • DOI: 10.1002/0471250953.bi1003s00

IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
journal, October 2016

  • Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw1030

Omega: an Overlap-graph de novo Assembler for Metagenomics
journal, June 2014


IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth
journal, April 2012


Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut
journal, January 2014

  • Vázquez-Castellanos, Jorge F.; García-López, Rodrigo; Pérez-Brocal, Vicente
  • BMC Genomics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2164-15-37

Viral metagenomics
journal, May 2005

  • Edwards, Robert A.; Rohwer, Forest
  • Nature Reviews Microbiology, Vol. 3, Issue 6
  • DOI: 10.1038/nrmicro1163

Marine T4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere
journal, August 2005

  • Filee, J.; Tetart, F.; Suttle, C. A.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 35
  • DOI: 10.1073/pnas.0503404102

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
journal, November 2009


Distantly sampled soils carry few species in common
journal, June 2008

  • Fulthorpe, Roberta R.; Roesch, Luiz F. W.; Riva, Alberto
  • The ISME Journal, Vol. 2, Issue 9
  • DOI: 10.1038/ismej.2008.55

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics
journal, October 2013


Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities: Using
journal, September 2013

  • Logares, Ramiro; Sunagawa, Shinichi; Salazar, Guillem
  • Environmental Microbiology, Vol. 16, Issue 9
  • DOI: 10.1111/1462-2920.12250

Metavir 2: new tools for viral metagenome comparison and assembled virome analysis
journal, January 2014


A survey of error-correction methods for next-generation sequencing
journal, April 2012

  • Yang, X.; Chockalingam, S. P.; Aluru, S.
  • Briefings in Bioinformatics, Vol. 14, Issue 1
  • DOI: 10.1093/bib/bbs015

Robust estimation of microbial diversity in theory and in practice
journal, February 2013

  • Haegeman, Bart; Hamelin, Jérôme; Moriarty, John
  • The ISME Journal, Vol. 7, Issue 6
  • DOI: 10.1038/ismej.2013.10

Differential abundance analysis for microbial marker-gene surveys
journal, September 2013

  • Paulson, Joseph N.; Stine, O. Colin; Bravo, Héctor Corrada
  • Nature Methods, Vol. 10, Issue 12
  • DOI: 10.1038/nmeth.2658

The bright side of microbial dark matter: lessons learned from the uncultivated majority
journal, June 2016


vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria
journal, January 2017


Microbes do not follow the elevational diversity patterns of plants and animals
journal, April 2011

  • Fierer, Noah; McCain, Christy M.; Meir, Patrick
  • Ecology, Vol. 92, Issue 4
  • DOI: 10.1890/10-1170.1

Unveiling viral–host interactions within the ‘microbial dark matter’
journal, August 2014

  • Martínez-García, Manuel; Santos, Fernando; Moreno-Paz, Mercedes
  • Nature Communications, Vol. 5, Issue 1
  • DOI: 10.1038/ncomms5542

Viral and microbial community dynamics in four aquatic environments
journal, February 2010

  • Rodriguez-Brito, Beltran; Li, LinLin; Wegley, Linda
  • The ISME Journal, Vol. 4, Issue 6
  • DOI: 10.1038/ismej.2010.1

The Microbial Engines That Drive Earth's Biogeochemical Cycles
journal, May 2008


Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations
journal, September 2015

  • García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés
  • Frontiers in Bioengineering and Biotechnology, Vol. 3
  • DOI: 10.3389/fbioe.2015.00141

Genomic diversification of marine cyanophages into stable ecotypes: Cyanophage diversification into ecotypes
journal, October 2016

  • Marston, Marcia F.; Martiny, Jennifer B. H.
  • Environmental Microbiology, Vol. 18, Issue 11
  • DOI: 10.1111/1462-2920.13556

    Works referencing / citing this record:

    Mouse Vendor Influence on the Bacterial and Viral Gut Composition Exceeds the Effect of Diet
    journal, May 2019

    • Rasmussen, Torben Sølbeck; de Vries, Liv; Kot, Witold
    • Viruses, Vol. 11, Issue 5
    • DOI: 10.3390/v11050435

    Towards optimized viral metagenomes for double-stranded and single-stranded DNA viruses from challenging soils
    journal, January 2019


    Mouse Vendor Influence on the Bacterial and Viral Gut Composition Exceeds the Effect of Diet
    journal, May 2019

    • Rasmussen, Torben Sølbeck; de Vries, Liv; Kot, Witold
    • Viruses, Vol. 11, Issue 5
    • DOI: 10.3390/v11050435

    Towards optimized viral metagenomes for double-stranded and single-stranded DNA viruses from challenging soils
    journal, January 2019


    MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins
    journal, August 2018