skip to main content

DOE PAGESDOE PAGES

Title: in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies andmore » popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
Authors:
ORCiD logo [1] ; ORCiD logo [2] ; ORCiD logo [2] ; ORCiD logo [3] ; ORCiD logo [2] ; ORCiD logo [1]
  1. Vanderbilt Univ., Nashville, TN (United States). Dept. of Biological Sciences
  2. Univ. of Wisconsin, Madison, WI (United States). Wisconsin Energy Inst., J. F. Crow Inst. for the Study of Evolution, Lab. of Genetics, Genome Center of Wisconsin, Dept. of Energy Great Lakes Bioenergy Research Center
  3. US Dept. of Agriculture (USDA)., Peoria, IL (United States). National Center for Agricultural Utilization Research, Agricultural Research Service, Mycotoxin Prevention and Applied Microbiology Research Unit
Publication Date:
Grant/Contract Number:
AC02-06CH11357; AC02-05CH11231; FC02-07ER64494
Type:
Accepted Manuscript
Journal Name:
G3
Additional Journal Information:
Journal Volume: 6; Journal Issue: 11; Journal ID: ISSN 2160-1836
Publisher:
Genetics Society of America
Research Org:
Argonne National Lab. (ANL), Argonne, IL (United States). Advanced Photon Source (APS); Univ. of Wisconsin, Madison (United States); Lawrence Berkeley National Laboratory (LBNL), CA (United States)
Sponsoring Org:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23); National Institutes of Health (NIH); National Science Foundation (NSF)
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Genome Sequencing; High-Throughput Sequencing; De Novo Assembly; Experimental Design; Simulation; Nonmodel Organism
OSTI Identifier:
1373365
Alternate Identifier(s):
OSTI ID: 1378364