DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

Journal Article · · PeerJ
DOI: https://doi.org/10.7717/peerj.3486 · OSTI ID:1423937
 [1];  [2]
  1. Univ. of Nevada Reno, Reno NV (United States). Dept. of Biochemistry and Molecular Biology; University of Nevada Reno
  2. Univ. of Nevada Reno, Reno NV (United States). Dept. of Biochemistry and Molecular Biology

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.

Research Organization:
Univ. of Nevada Reno, Reno NV (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Biological Systems Science Division
Grant/Contract Number:
SC0008834
OSTI ID:
1423937
Journal Information:
PeerJ, Journal Name: PeerJ Vol. 5; ISSN 2167-8359
Publisher:
PeerJ Inc.Copyright Statement
Country of Publication:
United States
Language:
English

References (43)

GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework journal January 2005
Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm journal May 2010
Basic local alignment search tool journal October 1990
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. journal March 1990
GPU-BLAST: using graphics processors to accelerate protein sequence alignment journal November 2010
ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems journal January 2013
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads journal February 2014
The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools journal December 2011
Scientific workflow applications on Amazon EC2 conference December 2009
Compiler transformations for high-performance computing journal December 1994
HPC-BLAST: distributed BLAST for xeon phi clusters
  • Sawyer, Shane E.; Rekepalli, Bhanu; Horton, Mitchel D.
  • Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15 https://doi.org/10.1145/2808719.2811435
conference January 2015
SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+ journal January 2013
BLAST+: architecture and applications journal January 2009
Accumulation of DNA damage alters microRNA gene transcription in Arabidopsis thaliana journal December 2022
The Grid: A New Infrastructure for 21st Century Science book January 2003
GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework journal January 2005
Basic local alignment search tool journal October 1990
A high-performance MPI implementation on a shared-memory vector supercomputer journal January 1997
Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm journal May 2010
De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis journal July 2013
Coriander Genomics Database: a genomic, transcriptomic, and metabolic database for coriander journal April 2020
The Grid: A New Infrastructure for 21st Century Science journal February 2002
Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. journal March 1990
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing journal May 2012
GPU-BLAST: using graphics processors to accelerate protein sequence alignment journal November 2010
ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems journal January 2013
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads journal February 2014
The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools journal December 2011
BLAT---The BLAST-Like Alignment Tool journal March 2002
OpenMP: an industry standard API for shared-memory programming journal January 1998
Cloud Computing and Grid Computing 360-Degree Compared conference November 2008
GPU Computing journal May 2008
Coordinating Computation and I/O in Massively Parallel Sequence Search journal April 2011
MapReduce: simplified data processing on large clusters journal January 2008
Compiler transformations for high-performance computing journal December 1994
HPC-BLAST: distributed BLAST for xeon phi clusters
  • Sawyer, Shane E.; Rekepalli, Bhanu; Horton, Mitchel D.
  • Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15 https://doi.org/10.1145/2808719.2811435
conference January 2015
SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+ journal January 2013
PLAST: parallel local alignment search tool for database comparison journal October 2009
BLAST+: architecture and applications journal January 2009
Comparison of 15 dinoflagellate genomes reveals extensive sequence and structural divergence in family Symbiodiniaceae and genus Symbiodinium journal April 2021
Accelerated Profile HMM Searches journal October 2011
Cloud Computing and Grid Computing 360-Degree Compared text January 2009
SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads preprint January 2013

Cited By (1)