Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

diBELLA: Distributed Long Read to Long Read Alignment

Conference ·
 [1];  [1];  [1];  [2];  [1];  [1]
  1. Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
© 2019 ACM. We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers [29]. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC02-05CH11231
OSTI ID:
1602840
Country of Publication:
United States
Language:
English

References (10)

Shouji: a fast and efficient pre-alignment filter for sequence alignment journal March 2019
Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation journal March 2017
Landscape of Next-Generation Sequencing Technologies journal June 2011
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data journal May 2013
A bridging model for parallel computation journal August 1990
Phased diploid genome assembly with single-molecule real-time sequencing journal October 2016
Space/time trade-offs in hash coding with allowable errors journal July 1970
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory journal September 2012
Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data journal November 2017
HipMer: an extreme-scale de novo genome assembler
  • Georganas, Evangelos; Buluç, Aydın; Chapman, Jarrod
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807664
conference January 2015

Similar Records

Distributed Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (DiBELLA) v1.0.0
Software · Wed Oct 21 20:00:00 EDT 2020 · OSTI ID:code-52870

Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper (BELLA) v1.0
Software · Tue May 08 20:00:00 EDT 2018 · OSTI ID:code-17092

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly
Journal Article · Fri Apr 30 20:00:00 EDT 2021 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1818231

Related Subjects