skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel seed-based approach to multiple protein structure similarities detection

Abstract

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.

Authors:
 [1];  [1]; ORCiD logo [1];  [2]; ORCiD logo [1]
  1. INRIA/IRISA and Univ. of Rennes, Rennes Cedex (France)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1201429
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Scientific Programming
Additional Journal Information:
Journal Volume: 2015; Journal Issue: 20; Journal ID: ISSN 1058-9244
Publisher:
Hindawi
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States: N. p., 2015. Web. doi:10.1155/2015/279715.
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, & Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States. https://doi.org/10.1155/2015/279715
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. 2015. "Parallel seed-based approach to multiple protein structure similarities detection". United States. https://doi.org/10.1155/2015/279715. https://www.osti.gov/servlets/purl/1201429.
@article{osti_1201429,
title = {Parallel seed-based approach to multiple protein structure similarities detection},
author = {Chapuis, Guillaume and Le Boudic-Jamin, Mathilde and Andonov, Rumen and Djidjev, Hristo and Lavenier, Dominique},
abstractNote = {Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.},
doi = {10.1155/2015/279715},
url = {https://www.osti.gov/biblio/1201429}, journal = {Scientific Programming},
issn = {1058-9244},
number = 20,
volume = 2015,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
journal, September 1998


TM-align: a protein structure alignment algorithm based on the TM-score
journal, April 2005


Protein Structure Comparison by Alignment of Distance Matrices
journal, September 1993


Fast determination of the optimal rotational matrix for macromolecular superpositions
journal, January 2009


Scoring function for automated assessment of protein structure template quality
journal, January 2004


DALIX: Optimal DALI Protein Structure Alignment
journal, January 2013


Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space
journal, July 2013


Maximum Contact Map Overlap Revisited
journal, January 2011


Optimal Protein Structure Alignment Using Maximum Cliques
journal, June 2005


OpenMP: an industry standard API for shared-memory programming
journal, January 1998


The protein threading problem with sequence amino acid interaction preferences is NP-complete
journal, January 1994


PAUL: protein structural alignment using integer linear programming and Lagrangian relaxation
journal, October 2009


Anomalies in parallel branch-and-bound algorithms
journal, June 1984


Maximum Cliques in Protein Structure Comparison
book, January 2010


FlexSnap: Flexible Non-sequential Protein Structure Alignment
journal, January 2010


A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
journal, January 1998


ProSup: a refined tool for protein structure alignment
journal, November 2000


GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity
journal, January 2014


A Mathematical Framework for Protein Structure Comparison
journal, February 2011


A New Method to Detect Related Function Among Proteins Independent of Sequence and Fold Homology
journal, October 2002


Accuracy analysis of multiple structure alignments
journal, October 2009


Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core
journal, March 1993


Searching protein structure databases has come of age
journal, July 1994


LGA: a method for finding 3D similarities in protein structures
journal, July 2003


Toward the detection and validation of repeats in protein structure
journal, June 2004


Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
journal, March 2005


From The Cover: Approximate protein structural alignment in polynomial time
journal, August 2004


Parallel Seed-Based Approach to Protein Structure Similarity Detection
book, January 2014


Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
journal, March 2005


Protein structure similarities
journal, June 2001


Surprising similarities in structure comparison
journal, June 1996


From The Cover: Approximate protein structural alignment in polynomial time
journal, August 2004


TM-align: a protein structure alignment algorithm based on the TM-score
journal, April 2005


The protein threading problem with sequence amino acid interaction preferences is NP-complete
journal, January 1994


DALIX: Optimal DALI Protein Structure Alignment
journal, January 2013


FlexSnap: Flexible Non-sequential Protein Structure Alignment
journal, January 2010


Maximum Cliques in Protein Structure Comparison
preprint, January 2009


Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space
preprint, January 2013