skip to main content

SciTech ConnectSciTech Connect

Title: Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive feature of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.
Authors:
 [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Publication Date:
OSTI Identifier:
1237568
Report Number(s):
LLNL-TR--680718
DOE Contract Number:
AC52-07NA27344
Resource Type:
Technical Report
Research Org:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; 59 BASIC BIOLOGICAL SCIENCES