Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

High-throughput computation of pairwise sequence similarities for multiple genome comparison using ScalaBLAST

Conference ·
Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the Integrated Microbial Genome (IMG) system at the Joint Genome Institute (JGI). For a genome sequencing center to provide multiple-genome comparison capabilities, it must keep pace with exponentially growing collection of sequence data, both from its own genomes, and from public genomes. We present an example of how ScalaBLAST, a high-throughput sequence analysis program, harnesses increasingly critical high-performance computing to perform sequence analysis, enabling, for example, all vs. all BLAST runs across 2 million protein sequences within a day using thousands of processors as opposed to conventional comparison methods that would take years to complete.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US), Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
935612
Report Number(s):
PNNL-SA-56203; 20905; KJ0101030
Country of Publication:
United States
Language:
English

Similar Records

Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond
Technical Report · Fri Jun 01 00:00:00 EDT 2007 · OSTI ID:960403

ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems
Journal Article · Fri Mar 15 00:00:00 EDT 2013 · Bioinformatics, 29(6):797-8 · OSTI ID:1072883

ScalaBLAST: A Scalable Implementation of BLAST for High Performance Data-Intensive Bioinformatics Analysis
Journal Article · Tue Aug 01 00:00:00 EDT 2006 · IEEE Transactions on Parallel and Distributed Systems, 17(8):740-749 · OSTI ID:889526