skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: JAZZ; Whole Genome Shotgun Assembler

Authors:
; ; ; ; ; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1165586
Report Number(s):
LBNL-6839E
DOE Contract Number:
DE-AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: 2nd Annual JGI User Meeting, Walnut Creek, CA
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS; JAZZ, Whole-Genome Shotgun Assembler, Short Reads, Assembly

Citation Formats

Putnam, Nik, Chapman, Jarrod, Rokhsar, Dan, Dusheyko, Serge, Furman, Craig, Ho, Isaac, Ting, Sara, and Winward, Paul. JAZZ; Whole Genome Shotgun Assembler. United States: N. p., 2007. Web.
Putnam, Nik, Chapman, Jarrod, Rokhsar, Dan, Dusheyko, Serge, Furman, Craig, Ho, Isaac, Ting, Sara, & Winward, Paul. JAZZ; Whole Genome Shotgun Assembler. United States.
Putnam, Nik, Chapman, Jarrod, Rokhsar, Dan, Dusheyko, Serge, Furman, Craig, Ho, Isaac, Ting, Sara, and Winward, Paul. Mon . "JAZZ; Whole Genome Shotgun Assembler". United States. doi:. https://www.osti.gov/servlets/purl/1165586.
@article{osti_1165586,
title = {JAZZ; Whole Genome Shotgun Assembler},
author = {Putnam, Nik and Chapman, Jarrod and Rokhsar, Dan and Dusheyko, Serge and Furman, Craig and Ho, Isaac and Ting, Sara and Winward, Paul},
abstractNote = {},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Mar 26 00:00:00 EDT 2007},
month = {Mon Mar 26 00:00:00 EDT 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verificationmore » of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.« less
  • In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. Inmore » k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.« less
  • We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less