Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

Journal Article · · G3
 [1];  [2];  [2];  [3];  [2];  [1]
  1. Vanderbilt Univ., Nashville, TN (United States). Dept. of Biological Sciences
  2. Univ. of Wisconsin, Madison, WI (United States). Wisconsin Energy Inst., J. F. Crow Inst. for the Study of Evolution, Lab. of Genetics, Genome Center of Wisconsin, Dept. of Energy Great Lakes Bioenergy Research Center
  3. US Dept. of Agriculture (USDA)., Peoria, IL (United States). National Center for Agricultural Utilization Research, Agricultural Research Service, Mycotoxin Prevention and Applied Microbiology Research Unit

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.

Research Organization:
Argonne National Laboratory-Advanced Photon Source, Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Institutes of Health (NIH); National Science Foundation (NSF)
Grant/Contract Number:
AC02-06CH11357; AC02-05CH11231; FC02-07ER64494
OSTI ID:
1373365
Alternate ID(s):
OSTI ID: 1378364
Journal Information:
G3, Journal Name: G3 Journal Issue: 11 Vol. 6; ISSN 2160-1836
Publisher:
Genetics Society of AmericaCopyright Statement
Country of Publication:
United States
Language:
English

References (80)

One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly journal February 2015
Mitochondrial inheritance in budding yeasts: towards an integrated understanding journal November 2010
Harnessing genomics for evolutionary insights journal April 2009
Genome sequence of the human malaria parasite Plasmodium falciparum journal October 2002
Genome-scale approaches to resolving incongruence in molecular phylogenies journal October 2003
Inferring ancient divergences requires genes with strong phylogenetic signals journal May 2013
Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts journal September 2014
Comprehensive variation discovery in single human genomes journal October 2014
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data journal May 2013
A complete bacterial genome assembled de novo using only nanopore sequencing data journal June 2015
Sequence assembly demystified journal January 2013
DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies journal August 2016
Mutation rates in mammalian genomes journal January 2002
High-quality draft assemblies of mammalian genomes from massively parallel sequence data journal December 2010
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
ART: a next-generation sequencing read simulator journal December 2011
pIRS: Profile-based Illumina pair-end reads simulator journal April 2012
PBSIM: PacBio reads simulator—toward accurate genome assembly journal November 2012
QUAST: quality assessment tool for genome assemblies journal February 2013
GAGE-B: an evaluation of genome assemblers for bacterial organisms journal May 2013
The MaSuRCA genome assembler journal August 2013
Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences journal June 2014
RAMPART: a workflow management system for de novo genome assembly journal January 2015
Mitochondrial genome evolution in yeasts: an all-encompassing view journal May 2015
Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species journal November 2009
Simulation of Genome-Wide Evolution under Heterogeneous Substitution Models and Complex Multispecies Coalescent Histories journal March 2014
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification journal October 2014
Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage journal July 2016
Metassembler: Merging and optimizing de novo genome assemblies journal March 2015
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs journal February 2008
ABySS: A parallel assembler for short read sequence data journal February 2009
Assemblathon 1: A competitive assessment of de novo short read assembly methods journal September 2011
Efficient de novo assembly of large genomes using compressed data structures journal December 2011
GAGE: A critical evaluation of genome assemblies and assembly algorithms journal January 2012
Prevention, diagnosis and treatment of high-throughput sequencing data pathologies journal March 2014
Comparative genomics reveals insights into avian genome evolution and adaptation journal December 2014
The Genome Sequence of Drosophila melanogaster journal March 2000
Creating a Buzz About Insect Genomes journal March 2011
The Theory and Practice of Genome Sequence Assembly journal August 2015
Assessment of de novoassemblers for draft genomes: a case study with fungal genomes journal December 2014
Using cascading Bloom filters to improve the memory usage for de Brujin graphs journal January 2014
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species journal July 2013
REAPR: a universal tool for genome assembly evaluation journal January 2013
Characterizing and measuring bias in sequence data journal January 2013
Automated ensemble assembly and validation of microbial genomes text January 2014
Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies journal December 2014
Meraculous: De Novo Genome Assembly with Short Paired-End Reads journal August 2011
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement journal November 2014
Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes Fungi journal December 2012
Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis text January 2011
Informed and Automated k-Mer Size Selection for Genome Assembly preprint January 2013
Patterns of positive selection in seven ant genomes preprint January 2013
Using Cascading Bloom Filters to Improve the Memory Usage for de Brujin Graphs book January 2013
The complete sequence of the mitochondrial genome of Saccharomyces cerevisiae journal December 1998
Genomics and the making of yeast biodiversity journal December 2015
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana journal December 2000
Overview of the yeast genome journal May 1997
Erratum: Overview of the yeast genome journal June 1997
Assembling large genomes with single-molecule sequencing and locality-sensitive hashing journal May 2015
Fueling the future with fungal genomics journal July 2011
Informed and automated k-mer size selection for genome assembly journal June 2013
Patterns of Positive Selection in Seven Ant Genomes journal April 2014
The Genome Sequence of Saccharomyces eubayanus and the Domestication of Lager-Brewing Yeasts journal August 2015
Error correction and assembly complexity of single molecule sequencing reads posted_content June 2014
Finished bacterial genomes from shotgun sequence data journal July 2012
Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads journal April 2014
Field guide to next-generation DNA sequencers: FIELD GUIDE TO NEXT-GEN SEQUENCERS journal May 2011
A Whole-Genome Assembly of Drosophila journal March 2000
Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee journal January 2012
Exploiting sparseness in de novo genome assembly journal April 2012
Automated ensemble assembly and validation of microbial genomes journal May 2014
Assessment of de novoassemblers for draft genomes: a case study with fungal genomes journal December 2014
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler journal December 2012
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species journal July 2013
Reducing assembly complexity of microbial genomes with single-molecule sequencing journal January 2013
Metassembler: merging and optimizing de novo genome assemblies journal September 2015
Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis journal June 2011
Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results journal September 2011
Genome Sequence and Analysis of a Stress-Tolerant, Wild-Derived Strain of Saccharomyces cerevisiae Used in Biofuels Research journal April 2016
Metassembler: merging and optimizing de novo genome assemblies collection January 2015

Cited By (13)

Evolutionary instability of CUG-Leu in the genetic code of budding yeasts journal May 2018
Repeat-aware evaluation of scaffolding tools journal March 2018
Next-generation forward genetic screens: using simulated data to improve the design of mapping-by-sequencing experiments in Arabidopsis journal September 2019
Solving scaffolding problem with repeats posted_content May 2018
Mitochondria-encoded genes contribute to evolution of heat and cold tolerance in yeast journal January 2019
Extensive loss of cell-cycle and DNA repair genes in an ancient lineage of bipolar budding yeasts journal May 2019
Evolution of a novel chimeric maltotriose transporter in Saccharomyces eubayanus from parent proteins unable to perform this function journal April 2019
Evidence for loss and reacquisition of alcoholic fermentation in a fructophilic yeast lineage journal April 2018
Fermentation innovation through complex hybridization of wild and domesticated yeasts journal October 2019
sppIDer: A Species Identification Tool to Investigate Hybrid Genomes with High-Throughput Sequencing journal September 2018
Eukaryotic Acquisition of a Bacterial Operon journal August 2018
A Robust Phylogenomic Time Tree for Biotechnologically and Medically Important Fungi in the Genera Aspergillus and Penicillium journal August 2019
Hybridization and adaptive evolution of diverse Saccharomyces species for cellulosic biofuel production journal March 2017

Figures / Tables (5)


Similar Records

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
Journal Article · Sat May 01 00:00:00 EDT 2021 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1818231