skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel seed-based approach to multiple protein structure similarities detection

Abstract

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.

Authors:
 [1];  [1]; ORCiD logo [1];  [2]; ORCiD logo [1]
  1. INRIA/IRISA and Univ. of Rennes, Rennes Cedex (France)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1201429
Resource Type:
Accepted Manuscript
Journal Name:
Scientific Programming
Additional Journal Information:
Journal Volume: 2015; Journal Issue: 20; Journal ID: ISSN 1058-9244
Publisher:
Hindawi
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States: N. p., 2015. Web. doi:10.1155/2015/279715.
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, & Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States. doi:10.1155/2015/279715.
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. Thu . "Parallel seed-based approach to multiple protein structure similarities detection". United States. doi:10.1155/2015/279715. https://www.osti.gov/servlets/purl/1201429.
@article{osti_1201429,
title = {Parallel seed-based approach to multiple protein structure similarities detection},
author = {Chapuis, Guillaume and Le Boudic-Jamin, Mathilde and Andonov, Rumen and Djidjev, Hristo and Lavenier, Dominique},
abstractNote = {Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.},
doi = {10.1155/2015/279715},
journal = {Scientific Programming},
number = 20,
volume = 2015,
place = {United States},
year = {2015},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: