skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

Abstract

The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembledmore » data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.« less

Authors:
; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1167448
Report Number(s):
LBNL-6851E
DOE Contract Number:  
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: Genome Informatics at CSHL, Cold Spring Harbor, New York, September 13-16
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; High-Throughput, Eukaryotic, Whole-Genome Shotgun Assembly, WGS

Citation Formats

Pangilinan, Jasmyn, Shapiro, Harris, Tu, Hank, and Platt, Darren. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly. United States: N. p., 2006. Web.
Pangilinan, Jasmyn, Shapiro, Harris, Tu, Hank, & Platt, Darren. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly. United States.
Pangilinan, Jasmyn, Shapiro, Harris, Tu, Hank, and Platt, Darren. Mon . "Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly". United States. doi:. https://www.osti.gov/servlets/purl/1167448.
@article{osti_1167448,
title = {Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly},
author = {Pangilinan, Jasmyn and Shapiro, Harris and Tu, Hank and Platt, Darren},
abstractNote = {The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Feb 06 00:00:00 EST 2006},
month = {Mon Feb 06 00:00:00 EST 2006}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: