Complete Assembly of Circular and Chloroplast Genomes Based on Global Optimization
- Univ. of Rennes (France)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
This paper focuses on the last two stages of genome assembly, namely scaffolding and gap-filling, and shows that they can be solved as part of a single optimization problem. Our approach is based on modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many as possible of the distance constraints encoding the insert-size information. We formulate it as a mixed-integer linear programming problem and apply an optimization solver to find the exact solutions on a benchmark of chloroplasts. We show that the presence of repetitions in the set of unitigs is the main reason for the existence of multiple equivalent solutions that are associated to alternative subpaths. We also describe two sufficient conditions and we design efficient algorithms for identifying these subpaths. Comparisons of the results achieved by our tool with the ones obtained with recent assemblers are presented.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- 89233218CNA000001
- OSTI ID:
- 1526953
- Report Number(s):
- LA-UR-18-25924
- Journal Information:
- Journal of Bioinformatics and Computational Biology, Vol. 17, Issue 3; ISSN 0219-7200
- Publisher:
- World ScientificCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Web of Science
Similar Records
$\mathrm{COBRA}$ improves the completeness and contiguity of viral genomes assembled from metagenomes
High-throughput sequencing of the chloroplast and mitochondrion of Chlamydomonas reinhardtii to generate improved de novo assemblies, analyze expression patterns and transcript speciation, and evaluate diversity among laboratory strains and wild isolates