skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.

Abstract

De novo genome assembly is one of the most important and challenging computational problems in modern genomics; further, it shares algorithms and communication patterns important to other graph analytic and irregular applications. Unlike simulations, it has no floating point arithmetic and is dominated by small memory transactions within and between computing nodes. In this work, we focus on the highly scalable HipMer assembler and identify the dominant algorithms and communication patterns, also using microbenchmarks to capture the workload. We evaluate HipMer on a variety of platforms from the latest HPC systems to ethernet clusters. HipMer performs well on all single node systems, including the Xeon Phi manycore architecture. Given large enough problems, it also demonstrates excellent scaling across nodes in an HPC system, but requires a high speed network with low overhead and high injection rates. Our results shed light on the architectural features that are most important for achieving good parallel efficiency on this and related problems.

Authors:
 [1];  [1];  [2];  [1];  [1];  [3];  [3];  [1]
  1. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA
  2. Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA
  3. National Energy Research Scientific Computing Center, Berkeley, CA
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567514
Resource Type:
Conference
Journal Name:
Lecture Notes in Computer Science, vol 10417. Springer, Cham
Additional Journal Information:
Conference: Euro-Par 2017, Galicia, Spain, 28 August-1 September 2017.
Country of Publication:
United States
Language:
English

Citation Formats

Ellis, Marquita, Georganas, Evangelos, Egan, Rob, Hofmeyr, Steven, Buluc, Aydin, Cook, Brandon, Oliker, Leonid, and Yelick, Katherine. Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.. United States: N. p., 2017. Web. doi:10.1007/978-3-319-64203-1_6.
Ellis, Marquita, Georganas, Evangelos, Egan, Rob, Hofmeyr, Steven, Buluc, Aydin, Cook, Brandon, Oliker, Leonid, & Yelick, Katherine. Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.. United States. doi:10.1007/978-3-319-64203-1_6.
Ellis, Marquita, Georganas, Evangelos, Egan, Rob, Hofmeyr, Steven, Buluc, Aydin, Cook, Brandon, Oliker, Leonid, and Yelick, Katherine. Tue . "Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.". United States. doi:10.1007/978-3-319-64203-1_6.
@article{osti_1567514,
title = {Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.},
author = {Ellis, Marquita and Georganas, Evangelos and Egan, Rob and Hofmeyr, Steven and Buluc, Aydin and Cook, Brandon and Oliker, Leonid and Yelick, Katherine},
abstractNote = {De novo genome assembly is one of the most important and challenging computational problems in modern genomics; further, it shares algorithms and communication patterns important to other graph analytic and irregular applications. Unlike simulations, it has no floating point arithmetic and is dominated by small memory transactions within and between computing nodes. In this work, we focus on the highly scalable HipMer assembler and identify the dominant algorithms and communication patterns, also using microbenchmarks to capture the workload. We evaluate HipMer on a variety of platforms from the latest HPC systems to ethernet clusters. HipMer performs well on all single node systems, including the Xeon Phi manycore architecture. Given large enough problems, it also demonstrates excellent scaling across nodes in an HPC system, but requires a high speed network with low overhead and high injection rates. Our results shed light on the architectural features that are most important for achieving good parallel efficiency on this and related problems.},
doi = {10.1007/978-3-319-64203-1_6},
journal = {Lecture Notes in Computer Science, vol 10417. Springer, Cham},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies
journal, November 2010

  • Boisvert, Sébastien; Laviolette, François; Corbeil, Jacques
  • Journal of Computational Biology, Vol. 17, Issue 11
  • DOI: 10.1089/cmb.2009.0238

Meraculous: De Novo Genome Assembly with Short Paired-End Reads
journal, August 2011


A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome
journal, January 2015


Assemblathon 1: A competitive assessment of de novo short read assembly methods
journal, September 2011


Parallelized short read assembly of large genomes using de Bruijn graphs
journal, August 2011

  • Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L.
  • BMC Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2105-12-354

Assembly algorithms for next-generation sequencing data
journal, June 2010


GAGE: A critical evaluation of genome assemblies and assembly algorithms
journal, January 2012

  • Salzberg, S. L.; Phillippy, A. M.; Zimin, A.
  • Genome Research, Vol. 22, Issue 3
  • DOI: 10.1101/gr.131383.111

ABySS: A parallel assembler for short read sequence data
journal, February 2009


Identification of common molecular subsequences
journal, March 1981