skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Meraculous: De Novo Genome Assembly with Short Paired-End Reads

Journal Article · · PLoS ONE

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ~280 bp or ~3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Genomics Division
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
1056549
Report Number(s):
LBNL-5312E
Journal Information:
PLoS ONE, Vol. 6, Issue 8; ISSN 1932-6203
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English

Similar Records

An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies
Journal Article · Sat Dec 13 00:00:00 EST 2014 · BMC Genomics · OSTI ID:1056549

Meraculous2
Software · Sun Jun 01 00:00:00 EDT 2014 · OSTI ID:1056549

Omega: an Overlap-graph de novo Assembler for Meta-genomics
Journal Article · Wed Jan 01 00:00:00 EST 2014 · Bioinformatics · OSTI ID:1056549

Related Subjects