skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ADEPT: a domain independent sequence alignment strategy for gpu architectures

Journal Article · · BMC Bioinformatics

Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases. In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT’s driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation. ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC); Exascale Computing Project
Grant/Contract Number:
AC02-05CH11231; 17-SC-20-SC
OSTI ID:
1706662
Journal Information:
BMC Bioinformatics, Vol. 21, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (26)

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks journal January 2018
diBELLA: Distributed Long Read to Long Read Alignment conference January 2019
Identification of common molecular subsequences journal March 1981
Using video-oriented instructions to speed up sequence comparison journal January 1997
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Striped Smith-Waterman speeds database searches six times over other SIMD implementations journal November 2006
GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data journal October 2019
Fast gapped-read alignment with Bowtie 2 journal March 2012
Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC: Evaluating the Networking Characteristics of the Cray XC-40 Intel Knights Landing Based Cori Supercomputer at NERSC journal September 2017
The SeqAn C++ template library for efficient sequence analysis: A resource for programmers journal November 2017
Partitioning biological data with transitivity clustering journal June 2010
GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data journal October 2018
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
Selecting the Right Similarity‐Scoring Matrix journal October 2013
Terabase-scale metagenome coassembly with MetaHipMer journal July 2020
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment journal March 2008
An efficient algorithm for large-scale detection of protein families journal April 2002
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors journal August 2000
Fast and sensitive protein alignment using DIAMOND journal November 2014
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures journal December 2013
An improved algorithm for matching biological sequences journal December 1982
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation journal June 2011
Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities: Metagenomic and rRNA diversity characterization journal February 2013
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications journal December 2013
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading journal May 2018
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions journal April 2013

Similar Records

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
Journal Article · Wed Feb 10 00:00:00 EST 2016 · BMC Bioinformatics · OSTI ID:1706662

LOGAN: High-Performance X-Drop Pairwise Alignment on GPU (LOGAN) v1.0
Software · Tue Nov 05 00:00:00 EST 2019 · OSTI ID:1706662

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment
Journal Article · Mon May 18 00:00:00 EDT 2020 · Proceedings - IEEE International Parallel and Distributed Processing Symposium (IPDPS) · OSTI ID:1706662