skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity

Abstract

BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii)more » coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less

Authors:
 [1];  [1];  [2];  [3]
  1. The Ohio State Univ., Columbus, OH (United States). Department of Microbiology
  2. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  3. The Ohio State Univ., Columbus, OH (United States). Department of Microbiology and Department of Civil, Environmental and Geodetic Engineering
Publication Date:
Research Org.:
Univ. of Arizona, Tucson, AZ (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Biological Systems Science Division
OSTI Identifier:
1424953
Grant/Contract Number:
SC0010580; SC0016440; AC02-05CH11231
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
PeerJ
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2167-8359
Publisher:
PeerJ Inc.
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Bioinformatics; Ecology; Genomics; Microbiology

Citation Formats

Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., and Sullivan, Matthew B.. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. United States: N. p., 2017. Web. doi:10.7717/peerj.3817.
Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., & Sullivan, Matthew B.. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. United States. doi:10.7717/peerj.3817.
Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., and Sullivan, Matthew B.. Thu . "Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity". United States. doi:10.7717/peerj.3817. https://www.osti.gov/servlets/purl/1424953.
@article{osti_1424953,
title = {Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity},
author = {Roux, Simon and Emerson, Joanne B. and Eloe-Fadrosh, Emiley A. and Sullivan, Matthew B.},
abstractNote = {BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.},
doi = {10.7717/peerj.3817},
journal = {PeerJ},
number = ,
volume = 5,
place = {United States},
year = {Thu Sep 21 00:00:00 EDT 2017},
month = {Thu Sep 21 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Save / Share: