A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection

Wu, Changjun; Kalyanaraman, Anantharaman; Cannon, William R

doi:10.1109/ICPP.2010.41

Title: A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection

Conference · Mon Sep 13 00:00:00 EDT 2010

DOI:https://doi.org/10.1109/ICPP.2010.41· OSTI ID:1043143

Wu, Changjun; Kalyanaraman, Anantharaman; Cannon, William R

Protein sequence homology detection is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting homology between two protein sequences is computationally inexpensive, detecting pairwise homology at a large-scale becomes prohibitive, requiring millions of CPU hours. Yet, there is currently no efficient method available to parallelize this kernel. In this paper, we present the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for large-scale protein sequence data. Our method, called pGraph, is designed using a hierarchical multiple-master multiple-worker model, where the processor space is partitioned into subgroups and the hierarchy helps in ensuring the workload is load balanced fashion despite the inherent irregularity that may originate in the input. Experimental evaluation demonstrates that our method scales linearly on all input sizes tested (up to 640K sequences) on a 1,024 node supercomputer. In addition to demonstrating strong scaling, we present an extensive study of the various components of the system and related parametric studies.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 1043143

Report Number(s):: PNNL-SA-76029; KJ0403000; TRN: US201213%%108

Resource Relation:: Conference: 39th International Conference on Parallel Processing (ICPP2010), September 13-16, 2010, San Diego, California, 333-342

Country of Publication:: United States

Language:: English

Similar Records

pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

Journal Article · Sat Sep 15 00:00:00 EDT 2012 · IEEE Transactions on Parallel and Distributed Systems, 23(10):1923-1933 · OSTI ID:1043143

Wu, Changjun; Kalyanaraman, Anantharaman; Cannon, William R

Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

Thesis/Dissertation · Fri May 01 00:00:00 EDT 2015 · OSTI ID:1043143

Daily, Jeffrey A.

A work stealing based approach for enabling scalable optimal sequence homology detection

Journal Article · Fri May 01 00:00:00 EDT 2015 · Journal of Parallel and Distributed Computing · OSTI ID:1043143

Daily, Jeffrey A.; Kalyanaraman, Anantharaman; Krishnamoorthy, Sriram; +1 more

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
60 APPLIED LIFE SCIENCES
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ALGORITHMS
DETECTION
EVALUATION
MOLECULAR BIOLOGY
PARALLEL PROCESSING
PROTEINS
SCALING LAWS

Title: A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection

Citation Formats

Similar Records

Related Subjects