Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

Journal Article · · BMC Bioinformatics
 [1]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1243170
Report Number(s):
PNNL-SA--114370
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Vol. 17; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (21)

An improved algorithm for matching biological sequences journal December 1982
A work stealing based approach for enabling scalable optimal sequence homology detection journal May 2015
Basic local alignment search tool journal October 1990
Using video-oriented instructions to speed up sequence comparison journal January 1997
Striped Smith-Waterman speeds database searches six times over other SIMD implementations journal November 2006
UniProt: a hub for protein information journal October 2014
BERTology Meets Biology: Interpreting Attention in Protein Language Models posted_content July 2020
BLAST+: architecture and applications journal January 2009
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation journal June 2011
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 journal January 2008
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications journal December 2013
Basic local alignment search tool journal October 1990
Improved sensitivity of nucleic acid database searches using application-specific scoring matrices journal August 1991
Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander journal April 2020
Amino acid substitution matrices from protein blocks. journal November 1992
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors journal August 2000
Enhanced Suffix Arrays and Applications book December 2005
Enhanced Suffix Arrays and Applications book December 2005
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale report May 2015
SWPS3 - fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2 text January 2008
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications text January 2012

Cited By (19)

Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon journal November 2018
xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria posted_content September 2017
Introducing difference recurrence relations for faster semi-global alignment of long sequences journal February 2018
Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information journal November 2017
A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science journal February 2020
A review of alignment based similarity measures for web usage mining journal May 2019
SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions journal July 2018
Demonstration of End-to-End Automation of DNA Data Storage journal March 2019
Genome analysis of Mycobacterium avium subspecies hominissuis strain 109 journal December 2018
De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm journal April 2020
Vargas: heuristic-free alignment for assessing linear and graph read aligners journal April 2020
Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance journal January 2017
Minimap2: pairwise alignment for nucleotide sequences journal May 2018
Edlib: a C/C++ library for fast, exact sequence alignment using edit distance posted_content August 2016
Vargas: heuristic-free alignment for assessing linear and graph read aligners posted_content March 2020
Demonstration of End-to-End Automation of DNA Data Storage posted_content October 2018
SPAligner: Alignment of Long Diverged Molecular Sequences to Assembly Graphs posted_content August 2019
xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria journal February 2018
Minimap2: pairwise alignment for nucleotide sequences text January 2017


Figures / Tables (5)


Similar Records

Pairwise Sequence Alignment Library
Software · Wed May 20 00:00:00 EDT 2015 · OSTI ID:1232140

Pairwise Sequence Alignment Library
Software · Mon May 18 20:00:00 EDT 2015 · OSTI ID:code-3402