Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Estimating DNA coverage and abundance in metagenomes using a gamma approximation

Journal Article · · Bioinformatics Online

Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
Genomics Division
DOE Contract Number:
AC02-05CH11231
OSTI ID:
983279
Report Number(s):
LBNL-3424E
Journal Information:
Bioinformatics Online, Journal Name: Bioinformatics Online
Country of Publication:
United States
Language:
English

Similar Records

Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps
Journal Article · Thu May 22 00:00:00 EDT 2014 · G3 · OSTI ID:1627950

Decomposing a San Francisco estuary microbiome using long-read metagenomics reveals species- and strain-level dominance from picoeukaryotes to viruses
Journal Article · Mon Sep 16 20:00:00 EDT 2024 · mSystems · OSTI ID:2432477

CoverM: read alignment statistics for metagenomics
Journal Article · Sat Mar 29 00:00:00 EDT 2025 · Bioinformatics · OSTI ID:2570293