Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
Abstract
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, andmore »
- Authors:
-
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Publication Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1243170
- Report Number(s):
- PNNL-SA-114370
Journal ID: ISSN 1471-2105
- Grant/Contract Number:
- AC05-76RL01830
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Bioinformatics
- Additional Journal Information:
- Journal Volume: 17; Journal ID: ISSN 1471-2105
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Smith-Waterman; Needleman-Wunsch; semi-global alignment; sequence alignment; SIMD; database search
Citation Formats
Daily, Jeffrey A. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. United States: N. p., 2016.
Web. doi:10.1186/s12859-016-0930-z.
Daily, Jeffrey A. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. United States. https://doi.org/10.1186/s12859-016-0930-z
Daily, Jeffrey A. Wed .
"Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments". United States. https://doi.org/10.1186/s12859-016-0930-z. https://www.osti.gov/servlets/purl/1243170.
@article{osti_1243170,
title = {Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments},
author = {Daily, Jeffrey A.},
abstractNote = {Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.},
doi = {10.1186/s12859-016-0930-z},
journal = {BMC Bioinformatics},
number = ,
volume = 17,
place = {United States},
year = {Wed Feb 10 00:00:00 EST 2016},
month = {Wed Feb 10 00:00:00 EST 2016}
}
Web of Science
Figures / Tables:
Works referenced in this record:
UniProt: a hub for protein information
journal, October 2014
- Consortium, UniPot
- Nucleic Acids Research, Vol. 43, Issue D1, p. D204-D212
Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors
journal, August 2000
- Rognes, T.; Seeberg, E.
- Bioinformatics, Vol. 16, Issue 8
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2
journal, January 2008
- Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp
- BMC Research Notes, Vol. 1, Issue 1
Striped Smith-Waterman speeds database searches six times over other SIMD implementations
journal, November 2006
- Farrar, M.
- Bioinformatics, Vol. 23, Issue 2
An improved algorithm for matching biological sequences
journal, December 1982
- Gotoh, Osamu
- Journal of Molecular Biology, Vol. 162, Issue 3
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
journal, June 2011
- Rognes, Torbjørn
- BMC Bioinformatics, Vol. 12, Issue 1
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications
journal, December 2013
- Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P.
- PLoS ONE, Vol. 8, Issue 12
Improved sensitivity of nucleic acid database searches using application-specific scoring matrices
journal, August 1991
- States, D.; Gish, W.; Altschul, S.
- Methods, Vol. 3, Issue 1
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Basic local alignment search tool
journal, October 1990
- Altschul, Stephen F.; Gish, Warren; Miller, Webb
- Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
Amino acid substitution matrices from protein blocks.
journal, November 1992
- Henikoff, S.; Henikoff, J. G.
- Proceedings of the National Academy of Sciences, Vol. 89, Issue 22, p. 10915-10919
A work stealing based approach for enabling scalable optimal sequence homology detection
journal, May 2015
- Daily, Jeff; Kalyanaraman, Ananth; Krishnamoorthy, Sriram
- Journal of Parallel and Distributed Computing, Vol. 79-80
Enhanced Suffix Arrays and Applications
book, December 2005
- Abouelhoda, Mohamed; Kurtz, Stefan; Ohlebusch, Enno
- Chapman & Hall/CRC Computer & Information Science Series
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale
report, May 2015
- Daily, Jeffrey A.
Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander
journal, April 2020
- Song, Xiaoming; Nie, Fulei; Chen, Wei
- Horticulture Research, Vol. 7, Issue 1
Enhanced Suffix Arrays and Applications
book, December 2005
- Aluru, Srinivas
- Handbook of Computational Molecular Biology
SWPS3 - fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2
text, January 2008
- Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp
- ETH Zurich
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications
text, January 2012
- Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik
- arXiv
An improved algorithm for matching biological sequences
journal, December 1982
- Gotoh, Osamu
- Journal of Molecular Biology, Vol. 162, Issue 3
A work stealing based approach for enabling scalable optimal sequence homology detection
journal, May 2015
- Daily, Jeff; Kalyanaraman, Ananth; Krishnamoorthy, Sriram
- Journal of Parallel and Distributed Computing, Vol. 79-80
Using video-oriented instructions to speed up sequence comparison
journal, January 1997
- Wozniak, A.
- Bioinformatics, Vol. 13, Issue 2
Striped Smith-Waterman speeds database searches six times over other SIMD implementations
journal, November 2006
- Farrar, M.
- Bioinformatics, Vol. 23, Issue 2
UniProt: a hub for protein information
journal, October 2014
- Consortium, UniPot
- Nucleic Acids Research, Vol. 43, Issue D1, p. D204-D212
BLAST+: architecture and applications
journal, January 2009
- Camacho, Christiam; Coulouris, George; Avagyan, Vahram
- BMC Bioinformatics, Vol. 10, Issue 1
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation
journal, June 2011
- Rognes, Torbjørn
- BMC Bioinformatics, Vol. 12, Issue 1
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2
journal, January 2008
- Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp
- BMC Research Notes, Vol. 1, Issue 1
SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications
journal, December 2013
- Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P.
- PLoS ONE, Vol. 8, Issue 12
Works referencing / citing this record:
A review of alignment based similarity measures for web usage mining
journal, May 2019
- Luu, Vinh-Trung; Forestier, Germain; Weber, Jonathan
- Artificial Intelligence Review, Vol. 53, Issue 3
SWIMM 2.0: Enhanced Smith–Waterman on Intel’s Multicore and Manycore Architectures Based on AVX-512 Vector Extensions
journal, July 2018
- Rucci, Enzo; Garcia Sanchez, Carlos; Botella Juan, Guillermo
- International Journal of Parallel Programming, Vol. 47, Issue 2
Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon
journal, November 2018
- Sahlin, Kristoffer; Tomaszkiewicz, Marta; Makova, Kateryna D.
- Nature Communications, Vol. 9, Issue 1
Demonstration of End-to-End Automation of DNA Data Storage
journal, March 2019
- Takahashi, Christopher N.; Nguyen, Bichlien H.; Strauss, Karin
- Scientific Reports, Vol. 9, Issue 1
Genome analysis of Mycobacterium avium subspecies hominissuis strain 109
journal, December 2018
- Matern, William M.; Bader, Joel S.; Karakousis, Petros C.
- Scientific Data, Vol. 5, Issue 1
Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance
journal, January 2017
- Šošić, Martin; Šikić, Mile
- Bioinformatics, Vol. 33, Issue 9
Minimap2: pairwise alignment for nucleotide sequences
journal, May 2018
- Li, Heng
- Bioinformatics, Vol. 34, Issue 18
Vargas: heuristic-free alignment for assessing linear and graph read aligners
posted_content, March 2020
- Darby, Charlotte A.; Gaddipati, Ravi; Schatz, Michael C.
- Bioinformatics
SPAligner: Alignment of Long Diverged Molecular Sequences to Assembly Graphs
posted_content, August 2019
- Dvorkina, Tatiana; Antipov, Dmitry; Korobeynikov, Anton
- BMC Bioinformatics
Introducing difference recurrence relations for faster semi-global alignment of long sequences
journal, February 2018
- Suzuki, Hajime; Kasahara, Masahiro
- BMC Bioinformatics, Vol. 19, Issue S1
xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria
journal, February 2018
- Bush, Eliot C.; Clark, Anne E.; DeRanek, Carissa A.
- BMC Bioinformatics, Vol. 19, Issue 1
Vargas: heuristic-free alignment for assessing linear and graph read aligners
journal, April 2020
- Darby, Charlotte A.; Gaddipati, Ravi; Schatz, Michael C.
- Bioinformatics, Vol. 36, Issue 12
Demonstration of End-to-End Automation of DNA Data Storage
posted_content, October 2018
- Takahashi, Christopher N.; Nguyen, Bichlien H.; Strauss, Karin
- Scientific Reports
Edlib: a C/C++ library for fast, exact sequence alignment using edit distance
posted_content, August 2016
- Šošić, Martin; Šikić, Mile
- Bioinformatics
De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm
journal, April 2020
- Sahlin, Kristoffer; Medvedev, Paul
- Journal of Computational Biology, Vol. 27, Issue 4
Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon
journal, November 2018
- Sahlin, Kristoffer; Tomaszkiewicz, Marta; Makova, Kateryna D.
- Nature Communications, Vol. 9, Issue 1
Introducing difference recurrence relations for faster semi-global alignment of long sequences
journal, February 2018
- Suzuki, Hajime; Kasahara, Masahiro
- BMC Bioinformatics, Vol. 19, Issue S1
Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information
journal, November 2017
- Yokota, Ryo; Kaminaga, Yuki; Kobayashi, Tetsuya J.
- Frontiers in Immunology, Vol. 8
A Systematic Approach to Bacterial Phylogeny Using Order Level Sampling and Identification of HGT Using Network Science
journal, February 2020
- Khaledian, Ehdieh; Brayton, Kelly A.; Broschat, Shira L.
- Microorganisms, Vol. 8, Issue 2
Figures / Tables found in this record: