Abstract
We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, DiBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability.
- Developers:
-
Ellis, Marquita [1]
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Release Date:
- 2020-10-22
- Project Type:
- Open Source, No Publicly Available Repository
- Software Type:
- Scientific
- Licenses:
-
BSD 3-clause "New" or "Revised" License
- Sponsoring Org.:
-
USDOEPrimary Award/Contract Number:AC02-05CH11231Oak Ridge National LaboratoryPrimary Award/Contract Number:AWD1896Other Award/Contract Number:AWD3408
- Code ID:
- 52870
- Site Accession Number:
- 2020-158
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Country of Origin:
- United States
Citation Formats
Ellis, Marquita.
Distributed Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (DiBELLA) v1.0.0.
Computer Software.
USDOE, Oak Ridge National Laboratory.
22 Oct. 2020.
Web.
doi:10.11578/dc.20210316.1.
Ellis, Marquita.
(2020, October 22).
Distributed Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (DiBELLA) v1.0.0.
[Computer software].
https://doi.org/10.11578/dc.20210316.1.
Ellis, Marquita.
"Distributed Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (DiBELLA) v1.0.0." Computer software.
October 22, 2020.
https://doi.org/10.11578/dc.20210316.1.
@misc{
doecode_52870,
title = {Distributed Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (DiBELLA) v1.0.0},
author = {Ellis, Marquita},
abstractNote = {We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, DiBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability.},
doi = {10.11578/dc.20210316.1},
url = {https://doi.org/10.11578/dc.20210316.1},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20210316.1}},
year = {2020},
month = {oct}
}