skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Preliminary High-Throughput Metagenome Assembly

Abstract

Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

Authors:
; ; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1170764
Report Number(s):
LBNL-6770E
DOE Contract Number:
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: JGI 2nd Annual User's Meeting
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; High-Throughput, Metagenome Assembly, US Sludge, Whole-Genome Shutgun

Citation Formats

Dusheyko, Serge, Furman, Craig, Pangilinan, Jasmyn, Shapiro, Harris, and Tu, Hank. Preliminary High-Throughput Metagenome Assembly. United States: N. p., 2007. Web.
Dusheyko, Serge, Furman, Craig, Pangilinan, Jasmyn, Shapiro, Harris, & Tu, Hank. Preliminary High-Throughput Metagenome Assembly. United States.
Dusheyko, Serge, Furman, Craig, Pangilinan, Jasmyn, Shapiro, Harris, and Tu, Hank. Mon . "Preliminary High-Throughput Metagenome Assembly". United States. doi:. https://www.osti.gov/servlets/purl/1170764.
@article{osti_1170764,
title = {Preliminary High-Throughput Metagenome Assembly},
author = {Dusheyko, Serge and Furman, Craig and Pangilinan, Jasmyn and Shapiro, Harris and Tu, Hank},
abstractNote = {Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Mar 26 00:00:00 EDT 2007},
month = {Mon Mar 26 00:00:00 EDT 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, whichmore » is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.« less
  • New high-throughput DNA sequencing technologies have revolutionized how scientists study the organisms around us. In particular, microbiology - the study of the smallest, unseen organisms that pervade our lives - has embraced these new techniques to characterize and analyze the cellular constituents and use this information to develop novel tools, techniques, and therapeutics. So-called next-generation DNA sequencing platforms have resulted in huge increases in the amount of raw data that can be rapidly generated. Argonne National Laboratory developed the premier platform for the analysis of this new data (mg-rast) that is used by microbiologists worldwide. This paper uses the accountingmore » from the computational analysis of more than 10,000,000,000 bp of DNA sequence data, describes an analysis of the advanced computational requirements, and suggests the level of analysis that will be essential as microbiologists move to understand how these tiny organisms affect our every day lives. The results from this analysis indicate that data analysis is a linear problem, but that most analyses are held up in queues. With sufficient resources, computations could be completed in a few hours for a typical dataset. These data also suggest execution times that delimit timely completion of computational analyses, and provide bounds for problematic processes.« less
  • A new method for processing signals produced by high resolution, large volume semiconductor detectors is described. These detectors, to be used in the next generation of spectrometer arrays for nuclear research (i.e. EUROBALL, etc.), present a set of problems like resolution degradation due to charge trapping and ballistic deficit effects, poor resolution at a high count rate, long term and temperature instability, etc.. To solve these problems a new approach based on digital Moving Window Deconvolution (MWD) has been developed.
  • Laser shot peening, a surface treatment for metals, is known to induce residual compressive stresses to depths of over 1 mm providing improved component resistance to various forms of failure. Recent information also suggests that thermal relaxation of the laser induced stress is significantly less than that experienced by other forms of surface stressing that involve significantly higher levels of cold work. We have developed a unique solid state laser technology employing Nd:glass amplifier slabs and SBS phase conjugation that enables this process to move into high throughput production processing.
  • The paper describe a new powder diffraction instrument for synchrotron radiation sources which combines the high throughput of a position-sensitive detector system with the high resolution normally only provided by a crystal analyzer. It uses the Guinier geometry which is traditionally used with an x-ray tube source. This geometry adapts well to the synchrotron source, provided proper beam conditioning is applied. The high brightness of the SR source allows a high resolution to be achieved. When combined with a photon-counting silicon microstrip detector array, the system becomes a powerful instrument for radiation-sensitive samples or time-dependent phase transition studies.