skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Complete Assembly of Circular and Chloroplast Genomes Based on Global Optimization

Journal Article · · Journal of Bioinformatics and Computational Biology

This paper focuses on the last two stages of genome assembly, namely scaffolding and gap-filling, and shows that they can be solved as part of a single optimization problem. Our approach is based on modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many as possible of the distance constraints encoding the insert-size information. We formulate it as a mixed-integer linear programming problem and apply an optimization solver to find the exact solutions on a benchmark of chloroplasts. We show that the presence of repetitions in the set of unitigs is the main reason for the existence of multiple equivalent solutions that are associated to alternative subpaths. We also describe two sufficient conditions and we design efficient algorithms for identifying these subpaths. Comparisons of the results achieved by our tool with the ones obtained with recent assemblers are presented.

Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1526953
Report Number(s):
LA-UR-18-25924
Journal Information:
Journal of Bioinformatics and Computational Biology, Vol. 17, Issue 3; ISSN 0219-7200
Publisher:
World ScientificCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

References (26)

An Eulerian path approach to DNA fragment assembly journal August 2001
QUAST: quality assessment tool for genome assemblies journal February 2013
GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies journal April 2012
Global Optimization for Scaffolding and Completing Genome Assemblies journal February 2018
Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers journal November 2011
ScaffMatch: scaffolding algorithm based on maximum weight matching journal April 2015
Informed and automated k-mer size selection for genome assembly journal June 2013
Fast scaffolding with small independent mixed integer programs journal October 2011
Gap Filling as Exact Path Length Problem journal May 2016
Scaffolding pre-assembled contigs using SSPACE journal December 2010
Scaffolding Problems Revisited: Complexity, Approximation and Fixed Parameter Tractable Algorithms, and Some Special Cases journal January 2018
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
The greedy path-merging algorithm for contig scaffolding journal September 2002
Exact approaches for scaffolding journal October 2015
ART: a next-generation sequencing read simulator journal December 2011
Read mapping on de Bruijn graphs journal June 2016
De Novo Repeat Classification and Fragment Assembly journal September 2004
Human Whole-Genome Shotgun Sequencing journal May 1997
OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees journal May 2016
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences journal March 2016
Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III journal March 2007
BESST - Efficient scaffolding of large fragmented assemblies journal August 2014
Space-efficient and exact de Bruijn graph representation based on a Bloom filter journal January 2013
Mutations in virus-derived small RNAs journal June 2020
De novo repeat classification and fragment assembly
  • Pevzner, Pavel A.; Tang, Haixu; Tesler, Glenn
  • Proceedings of the eighth annual international conference on Computational molecular biology - RECOMB '04 https://doi.org/10.1145/974614.974643
conference January 2004
Informed and Automated k-Mer Size Selection for Genome Assembly preprint January 2013