Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Performance Characterization of De Novo Genome Assembly on Leading Parallel Systems.

Conference · · Lecture Notes in Computer Science, vol 10417. Springer, Cham
 [1];  [1];  [2];  [1];  [1];  [3];  [3];  [1]
  1. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA
  2. Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA
  3. National Energy Research Scientific Computing Center, Berkeley, CA
De novo genome assembly is one of the most important and challenging computational problems in modern genomics; further, it shares algorithms and communication patterns important to other graph analytic and irregular applications. Unlike simulations, it has no floating point arithmetic and is dominated by small memory transactions within and between computing nodes. In this work, we focus on the highly scalable HipMer assembler and identify the dominant algorithms and communication patterns, also using microbenchmarks to capture the workload. We evaluate HipMer on a variety of platforms from the latest HPC systems to ethernet clusters. HipMer performs well on all single node systems, including the Xeon Phi manycore architecture. Given large enough problems, it also demonstrates excellent scaling across nodes in an HPC system, but requires a high speed network with low overhead and high injection rates. Our results shed light on the architectural features that are most important for achieving good parallel efficiency on this and related problems.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); None; Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science; USDOE
OSTI ID:
1567514
Conference Information:
Journal Name: Lecture Notes in Computer Science, vol 10417. Springer, Cham
Country of Publication:
United States
Language:
English

References (16)

SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale conference August 2016
Parallelized short read assembly of large genomes using de Bruijn graphs journal August 2011
Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
  • Georganas, Evangelos; Buluc, Aydin; Chapman, Jarrod
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2014.41
conference November 2014
GAGE: A critical evaluation of genome assemblies and assembly algorithms journal January 2012
Assembly algorithms for next-generation sequencing data journal June 2010
Assemblathon 1: A competitive assessment of de novo short read assembly methods journal September 2011
Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies journal November 2010
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome journal January 2015
Identification of common molecular subsequences journal March 1981
Computational Genome Analysis: An Introduction book January 2005
HipMer: an extreme-scale de novo genome assembler
  • Georganas, Evangelos; Buluç, Aydın; Chapman, Jarrod
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15 https://doi.org/10.1145/2807591.2807664
conference January 2015
A performance analysis of the Berkeley UPC compiler conference January 2003
Spaler: Spark and GraphX based de novo genome assembler conference October 2015
Meraculous: De Novo Genome Assembly with Short Paired-End Reads journal August 2011
ABySS: A parallel assembler for short read sequence data journal February 2009
merAligner: A Fully Parallel Sequence Aligner conference May 2015

Similar Records

Extreme-Scale De Novo Genome Assembly
Journal Article · Mon Sep 25 20:00:00 EDT 2017 · OSTI ID:1398520

A Locality-Based Threading Algorithm for the Configuration-Interaction Method
Journal Article · Sun Jul 02 20:00:00 EDT 2017 · IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum · OSTI ID:1393243

Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.
Technical Report · Mon May 01 00:00:00 EDT 2017 · OSTI ID:1367280

Related Subjects