skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: diBELLA: Distributed Long Read to Long Read Alignment

Abstract

© 2019 ACM. We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers [29]. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies.

Authors:
 [1];  [1];  [1];  [2];  [1];  [1]
  1. Univ. of California, Berkeley, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1602840
DOE Contract Number:  
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), August 2019, Kyoto, Japan
Country of Publication:
United States
Language:
English

Citation Formats

Ellis, Marquita, Guidi, Giulia, Buluc, Aydin, Oliker, Leonid, Yelick, Katherine, and Machinery, Assoc Comp. diBELLA: Distributed Long Read to Long Read Alignment. United States: N. p., 2019. Web. doi:10.1145/3337821.3337919.
Ellis, Marquita, Guidi, Giulia, Buluc, Aydin, Oliker, Leonid, Yelick, Katherine, & Machinery, Assoc Comp. diBELLA: Distributed Long Read to Long Read Alignment. United States. doi:10.1145/3337821.3337919.
Ellis, Marquita, Guidi, Giulia, Buluc, Aydin, Oliker, Leonid, Yelick, Katherine, and Machinery, Assoc Comp. Tue . "diBELLA: Distributed Long Read to Long Read Alignment". United States. doi:10.1145/3337821.3337919. https://www.osti.gov/servlets/purl/1602840.
@article{osti_1602840,
title = {diBELLA: Distributed Long Read to Long Read Alignment},
author = {Ellis, Marquita and Guidi, Giulia and Buluc, Aydin and Oliker, Leonid and Yelick, Katherine and Machinery, Assoc Comp},
abstractNote = {© 2019 ACM. We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers [29]. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies.},
doi = {10.1145/3337821.3337919},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Shouji: a fast and efficient pre-alignment filter for sequence alignment
journal, March 2019


Space/time trade-offs in hash coding with allowable errors
journal, July 1970


Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
journal, September 2012


Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
journal, May 2013

  • Chin, Chen-Shan; Alexander, David H.; Marks, Patrick
  • Nature Methods, Vol. 10, Issue 6
  • DOI: 10.1038/nmeth.2474

Phased diploid genome assembly with single-molecule real-time sequencing
journal, October 2016

  • Chin, Chen-Shan; Peluso, Paul; Sedlazeck, Fritz J.
  • Nature Methods, Vol. 13, Issue 12
  • DOI: 10.1038/nmeth.4035

HipMer: an extreme-scale de novo genome assembler
conference, January 2015

  • Georganas, Evangelos; Buluç, Aydın; Chapman, Jarrod
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
  • DOI: 10.1145/2807591.2807664

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data
journal, November 2017

  • Jayakumar, Vasanthan; Sakakibara, Yasubumi
  • Briefings in Bioinformatics, Vol. 20, Issue 3
  • DOI: 10.1093/bib/bbx147

Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation
journal, March 2017

  • Koren, Sergey; Walenz, Brian P.; Berlin, Konstantin
  • Genome Research, Vol. 27, Issue 5
  • DOI: 10.1101/gr.215087.116

Landscape of Next-Generation Sequencing Technologies
journal, June 2011

  • Niedringhaus, Thomas P.; Milanova, Denitsa; Kerby, Matthew B.
  • Analytical Chemistry, Vol. 83, Issue 12
  • DOI: 10.1021/ac2010857

A bridging model for parallel computation
journal, August 1990