Parallel seed-based approach to multiple protein structure similarities detection
Abstract
Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.
- Authors:
-
- INRIA/IRISA and Univ. of Rennes, Rennes Cedex (France)
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1201429
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- Scientific Programming
- Additional Journal Information:
- Journal Volume: 2015; Journal Issue: 20; Journal ID: ISSN 1058-9244
- Publisher:
- Hindawi
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING
Citation Formats
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States: N. p., 2015.
Web. doi:10.1155/2015/279715.
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, & Lavenier, Dominique. Parallel seed-based approach to multiple protein structure similarities detection. United States. https://doi.org/10.1155/2015/279715
Chapuis, Guillaume, Le Boudic-Jamin, Mathilde, Andonov, Rumen, Djidjev, Hristo, and Lavenier, Dominique. 2015.
"Parallel seed-based approach to multiple protein structure similarities detection". United States. https://doi.org/10.1155/2015/279715. https://www.osti.gov/servlets/purl/1201429.
@article{osti_1201429,
title = {Parallel seed-based approach to multiple protein structure similarities detection},
author = {Chapuis, Guillaume and Le Boudic-Jamin, Mathilde and Andonov, Rumen and Djidjev, Hristo and Lavenier, Dominique},
abstractNote = {Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makes our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.},
doi = {10.1155/2015/279715},
url = {https://www.osti.gov/biblio/1201429},
journal = {Scientific Programming},
issn = {1058-9244},
number = 20,
volume = 2015,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}
Web of Science
Works referenced in this record:
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
journal, September 1998
- Shindyalov, I. N.; Bourne, P. E.
- Protein Engineering Design and Selection, Vol. 11, Issue 9
TM-align: a protein structure alignment algorithm based on the TM-score
journal, April 2005
- Zhang, Y.
- Nucleic Acids Research, Vol. 33, Issue 7
Protein Structure Comparison by Alignment of Distance Matrices
journal, September 1993
- Holm, Liisa; Sander, Chris
- Journal of Molecular Biology, Vol. 233, Issue 1
Fast determination of the optimal rotational matrix for macromolecular superpositions
journal, January 2009
- Liu, Pu; Agrafiotis, Dimitris K.; Theobald, Douglas L.
- Journal of Computational Chemistry
Scoring function for automated assessment of protein structure template quality
journal, January 2004
- Zhang, Yang; Skolnick, Jeffrey
- Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 4
DALIX: Optimal DALI Protein Structure Alignment
journal, January 2013
- Wohlers, Inken; Andonov, Rumen; Klau, Gunnar W.
- IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 10, Issue 1
Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space
journal, July 2013
- Parra, R. Gonzalo; Espada, Rocío; Sánchez, Ignacio E.
- The Journal of Physical Chemistry B, Vol. 117, Issue 42
Maximum Contact Map Overlap Revisited
journal, January 2011
- Andonov, Rumen; Malod-Dognin, Noël; Yanev, Nicola
- Journal of Computational Biology, Vol. 18, Issue 1
Optimal Protein Structure Alignment Using Maximum Cliques
journal, June 2005
- Strickland, Dawn M.; Barnes, Earl; Sokol, Joel S.
- Operations Research, Vol. 53, Issue 3
MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments
journal, January 2013
- Minami, Shintaro; Sawada, Kengo; Chikenji, George
- BMC Bioinformatics, Vol. 14, Issue 1
OpenMP: an industry standard API for shared-memory programming
journal, January 1998
- Dagum, L.; Menon, R.
- IEEE Computational Science and Engineering, Vol. 5, Issue 1
The protein threading problem with sequence amino acid interaction preferences is NP-complete
journal, January 1994
- Lathrop, Richard H.
- "Protein Engineering, Design and Selection", Vol. 7, Issue 9
PAUL: protein structural alignment using integer linear programming and Lagrangian relaxation
journal, October 2009
- Wohlers, Inken; Petzold, Lars; Domingues, Francisco S.
- BMC Bioinformatics, Vol. 10, Issue S13
Anomalies in parallel branch-and-bound algorithms
journal, June 1984
- Lai, Ten-Hwang; Sahni, Sartaj
- Communications of the ACM, Vol. 27, Issue 6
Maximum Cliques in Protein Structure Comparison
book, January 2010
- Malod-Dognin, Noël; Andonov, Rumen; Yanev, Nicola
- Experimental Algorithms
FlexSnap: Flexible Non-sequential Protein Structure Alignment
journal, January 2010
- Salem, Saeed; Zaki, Mohammed J.; Bystroff, Chris
- Algorithms for Molecular Biology, Vol. 5, Issue 1
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
journal, January 1998
- Karypis, George; Kumar, Vipin
- SIAM Journal on Scientific Computing, Vol. 20, Issue 1
ProSup: a refined tool for protein structure alignment
journal, November 2000
- Lackner, Peter; Koppensteiner, Walter A.; Sippl, Manfred J.
- Protein Engineering, Design and Selection, Vol. 13, Issue 11
GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity
journal, January 2014
- Malod-Dognin, Noël; Pržulj, Nataša
- Bioinformatics, Vol. 30, Issue 9
A Mathematical Framework for Protein Structure Comparison
journal, February 2011
- Liu, Wei; Srivastava, Anuj; Zhang, Jinfeng
- PLoS Computational Biology, Vol. 7, Issue 2
A New Method to Detect Related Function Among Proteins Independent of Sequence and Fold Homology
journal, October 2002
- Schmitt, Stefan; Kuhn, Daniel; Klebe, Gerhard
- Journal of Molecular Biology, Vol. 323, Issue 2
Accuracy analysis of multiple structure alignments
journal, October 2009
- Berbalk, Christoph; Schwaiger, Christine S.; Lackner, Peter
- Protein Science, Vol. 18, Issue 10
Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core
journal, March 1993
- Subbiah, S.; Laurents, D. V.; Levitt, M.
- Current Biology, Vol. 3, Issue 3
Searching protein structure databases has come of age
journal, July 1994
- Holm, Liisa; Sander, Chris
- Proteins: Structure, Function, and Genetics, Vol. 19, Issue 3
LGA: a method for finding 3D similarities in protein structures
journal, July 2003
- Zemla, A.
- Nucleic Acids Research, Vol. 31, Issue 13
Toward the detection and validation of repeats in protein structure
journal, June 2004
- Murray, Kevin B.; Taylor, William R.; Thornton, Janet M.
- Proteins: Structure, Function, and Bioinformatics, Vol. 57, Issue 2
Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
journal, March 2005
- Kolodny, Rachel; Koehl, Patrice; Levitt, Michael
- Journal of Molecular Biology, Vol. 346, Issue 4
ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment
journal, March 2010
- Konc, Janez; Janežič, Dušanka
- Bioinformatics, Vol. 26, Issue 9
From The Cover: Approximate protein structural alignment in polynomial time
journal, August 2004
- Kolodny, R.; Linial, N.
- Proceedings of the National Academy of Sciences, Vol. 101, Issue 33
Parallel Seed-Based Approach to Protein Structure Similarity Detection
book, January 2014
- Chapuis, Guillaume; Le Boudic - Jamin, Mathilde; Andonov, Rumen
- Parallel Processing and Applied Mathematics
Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
journal, March 2005
- Kolodny, Rachel; Koehl, Patrice; Levitt, Michael
- Journal of Molecular Biology, Vol. 346, Issue 4
Protein structure similarities
journal, June 2001
- Koehl, Patrice
- Current Opinion in Structural Biology, Vol. 11, Issue 3
Surprising similarities in structure comparison
journal, June 1996
- Gibrat, Jean-Francois; Madej, Thomas; Bryant, Stephen H.
- Current Opinion in Structural Biology, Vol. 6, Issue 3
From The Cover: Approximate protein structural alignment in polynomial time
journal, August 2004
- Kolodny, R.; Linial, N.
- Proceedings of the National Academy of Sciences, Vol. 101, Issue 33
ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment
journal, March 2010
- Konc, Janez; Janežič, Dušanka
- Bioinformatics, Vol. 26, Issue 9
TM-align: a protein structure alignment algorithm based on the TM-score
journal, April 2005
- Zhang, Y.
- Nucleic Acids Research, Vol. 33, Issue 7
The protein threading problem with sequence amino acid interaction preferences is NP-complete
journal, January 1994
- Lathrop, Richard H.
- "Protein Engineering, Design and Selection", Vol. 7, Issue 9
DALIX: Optimal DALI Protein Structure Alignment
journal, January 2013
- Wohlers, Inken; Andonov, Rumen; Klau, Gunnar W.
- IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 10, Issue 1
MICAN : a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Cα only models, Alternative alignments, and Non-sequential alignments
journal, January 2013
- Minami, Shintaro; Sawada, Kengo; Chikenji, George
- BMC Bioinformatics, Vol. 14, Issue 1
FlexSnap: Flexible Non-sequential Protein Structure Alignment
journal, January 2010
- Salem, Saeed; Zaki, Mohammed J.; Bystroff, Chris
- Algorithms for Molecular Biology, Vol. 5, Issue 1
Maximum Cliques in Protein Structure Comparison
preprint, January 2009
- Malod-Dognin, Noël; Andonov, Rumen; Yanev, Nicola
- arXiv
Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space
preprint, January 2013
- Parra, R. Gonzalo; Espada, Rocío; Sánchez, Ignacio E.
- arXiv