skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

Abstract

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus,more » DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

Authors:
 [1];  [1]
  1. Univ. of Nevada Reno, Reno NV (United States). Dept. of Biochemistry and Molecular Biology
Publication Date:
Research Org.:
Univ. of Nevada Reno, Reno NV (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Biological Systems Science Division
OSTI Identifier:
1423937
Grant/Contract Number:
sc0008834
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
PeerJ
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2167-8359
Publisher:
PeerJ Inc.
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Agricultural Science; Bioinformatics; Computational Biology; Plant Science; BLAST; Sequence similarity; Parallel processing; Environment; Distributed computing; HPC

Citation Formats

Yim, Won Cheol, and Cushman, John C. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. United States: N. p., 2017. Web. doi:10.7717/peerj.3486.
Yim, Won Cheol, & Cushman, John C. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. United States. doi:10.7717/peerj.3486.
Yim, Won Cheol, and Cushman, John C. Sat . "Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments". United States. doi:10.7717/peerj.3486. https://www.osti.gov/servlets/purl/1423937.
@article{osti_1423937,
title = {Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments},
author = {Yim, Won Cheol and Cushman, John C.},
abstractNote = {Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.},
doi = {10.7717/peerj.3486},
journal = {PeerJ},
number = ,
volume = 5,
place = {United States},
year = {Sat Jul 22 00:00:00 EDT 2017},
month = {Sat Jul 22 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: