skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

Abstract

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus,more » DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

Authors:
 [1];  [1]
  1. Univ. of Nevada Reno, Reno NV (United States). Dept. of Biochemistry and Molecular Biology
Publication Date:
Research Org.:
Univ. of Nevada Reno, Reno NV (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23). Biological Systems Science Division
OSTI Identifier:
1423937
Grant/Contract Number:  
sc0008834
Resource Type:
Accepted Manuscript
Journal Name:
PeerJ
Additional Journal Information:
Journal Volume: 5; Journal ID: ISSN 2167-8359
Publisher:
PeerJ Inc.
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Agricultural Science; Bioinformatics; Computational Biology; Plant Science; BLAST; Sequence similarity; Parallel processing; Environment; Distributed computing; HPC

Citation Formats

Yim, Won Cheol, and Cushman, John C. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. United States: N. p., 2017. Web. doi:10.7717/peerj.3486.
Yim, Won Cheol, & Cushman, John C. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. United States. doi:10.7717/peerj.3486.
Yim, Won Cheol, and Cushman, John C. Sat . "Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments". United States. doi:10.7717/peerj.3486. https://www.osti.gov/servlets/purl/1423937.
@article{osti_1423937,
title = {Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments},
author = {Yim, Won Cheol and Cushman, John C.},
abstractNote = {Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.},
doi = {10.7717/peerj.3486},
journal = {PeerJ},
number = ,
volume = 5,
place = {United States},
year = {2017},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

MapReduce: simplified data processing on large clusters
journal, January 2008

  • Dean, Jeffrey; Ghemawat, Sanjay; Mehta, Brijesh
  • Communications of the ACM, Vol. 51, Issue 1
  • DOI: 10.1145/1327452.1327492

The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools
journal, December 2011

  • Lamesch, Philippe; Berardini, Tanya Z.; Li, Donghui
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1090

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads
journal, February 2014


A high-performance MPI implementation on a shared-memory vector supercomputer
journal, January 1997


SCBI_MapReduce, a New Ruby Task-Farm Skeleton for Automated Parallelisation and Distribution in Chunks of Sequences: The Implementation of a Boosted Blast+
journal, January 2013

  • Guerrero-Fernández, Darío; Falgueras, Juan; Claros, M. Gonzalo
  • Computational Biology Journal, Vol. 2013
  • DOI: 10.1155/2013/707540

GPU Computing
journal, May 2008


Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.
journal, March 1990

  • Karlin, S.; Altschul, S. F.
  • Proceedings of the National Academy of Sciences, Vol. 87, Issue 6
  • DOI: 10.1073/pnas.87.6.2264

BLAT---The BLAST-Like Alignment Tool
journal, March 2002


SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
journal, May 2012

  • Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry
  • Journal of Computational Biology, Vol. 19, Issue 5
  • DOI: 10.1089/cmb.2012.0021

Accelerated Profile HMM Searches
journal, October 2011


Coordinating Computation and I/O in Massively Parallel Sequence Search
journal, April 2011

  • Lin, Heshan; Ma, Xiaosong; Feng, Wuchun
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 22, Issue 4
  • DOI: 10.1109/TPDS.2010.101

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis
journal, July 2013

  • Haas, Brian J.; Papanicolaou, Alexie; Yassour, Moran
  • Nature Protocols, Vol. 8, Issue 8
  • DOI: 10.1038/nprot.2013.084

PLAST: parallel local alignment search tool for database comparison
journal, October 2009


OpenMP: an industry standard API for shared-memory programming
journal, January 1998

  • Dagum, L.; Menon, R.
  • IEEE Computational Science and Engineering, Vol. 5, Issue 1
  • DOI: 10.1109/99.660313

Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm
journal, May 2010


GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework
journal, January 2005

  • Krishnan, Arun
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 13
  • DOI: 10.1002/cpe.906

BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems
journal, January 2013


Compiler transformations for high-performance computing
journal, December 1994

  • Bacon, David F.; Graham, Susan L.; Sharp, Oliver J.
  • ACM Computing Surveys, Vol. 26, Issue 4
  • DOI: 10.1145/197405.197406

Cloud Computing and Grid Computing 360-Degree Compared
conference, November 2008


GPU-BLAST: using graphics processors to accelerate protein sequence alignment
journal, November 2010


HPC-BLAST: distributed BLAST for xeon phi clusters
conference, January 2015

  • Sawyer, Shane E.; Rekepalli, Bhanu; Horton, Mitchel D.
  • Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15
  • DOI: 10.1145/2808719.2811435