DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric

Journal Article · · Algorithms
DOI: https://doi.org/10.3390/a8040850 · OSTI ID:1329875
 [1];  [2];  [3];  [4];  [5]
  1. Univ. of Rennes 1 (France); National Inst. of Research in Computer Science and Automation (INRIA), Rennes (France)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. National Research Inst. for Mathematics and Computer Science (CWI), Amsterdam (Netherlands). Life Sciences
  4. Univ. of Rennes 1 (France); National Inst. of Research in Computer Science and Automation (INRIA), Rennes (France)
  5. Univ. of Duisburg-Essen, Essen (Germany). Genome Informatics; Univ. of Lubeck (Germany). Inst. of Neurogenetics and for Integrative and Experimental Genomics. Platform for Genome Analytics

In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifies up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.

Research Organization:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
Contributing Organization:
Univ. of Rennes 1 (France); National Inst. of Research in Computer Science and Automation (INRIA); National Research Inst. for Mathematics and Computer Science (CWI); Univ. of Duisburg-Essen, Essen (Germany); Univ. of Lubeck (Germany)
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1329875
Report Number(s):
LA-UR--15-24867
Journal Information:
Algorithms, Journal Name: Algorithms Journal Issue: 4 Vol. 8; ISSN 1999-4893
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (28)

Using Dominances for Solving the Protein Family Identification Problem book January 2011
CATH – a hierarchic classification of protein domain structures journal August 1997
A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem journal June 2007
Maximum Contact Map Overlap Revisited journal January 2011
GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity journal January 2014
TM-align: a protein structure alignment algorithm based on the TM-score journal April 2005
Regularities in interaction patterns of globular proteins journal January 1993
CSA: Comprehensive comparison of pairwise protein structure alignments text January 2012
The protein data bank: A computer-based archival file for macromolecular structures journal January 1978
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements journal January 1994
SCOP: A structural classification of proteins database for the investigation of sequences and structures journal April 1995
A modification of the LAESA algorithm for approximated k-NN classification journal January 2003
CATH – a hierarchic classification of protein domain structures journal August 1997
Distances between sets based on set commonality journal April 2014
Protein structure alignment beyond spatial proximity journal March 2013
Automatic classification of protein structure by using Gauss integrals journal December 2002
1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap journal January 2004
A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem journal June 2007
Maximum Contact Map Overlap Revisited journal January 2011
Towards optimal alignment of protein structure distance matrices journal July 2010
Fast large-scale clustering of protein structures using Gauss integrals journal December 2011
GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity journal January 2014
TM-align: a protein structure alignment algorithm based on the TM-score journal April 2005
CSA: comprehensive comparison of pairwise protein structure alignments journal May 2012
Regularities in interaction patterns of globular proteins journal January 1993
The protein threading problem with sequence amino acid interaction preferences is NP-complete journal January 1994
A simple and fast heuristic for protein structure comparison journal March 2008
Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis journal January 2009

Cited By (3)

Difference contact maps: from what to why in the analysis of the conformational flexibility of proteins journal March 2019
QUBO formulation for the contact map overlap problem journal December 2018
Difference contact maps: From what to why in the analysis of the conformational flexibility of proteins journal March 2020

Similar Records

A comparison of computational methods for the maximum contact map overlap of protein pairs.
Journal Article · Fri Aug 01 00:00:00 EDT 2003 · Proposed for publication in INFORMS J on Computing. · OSTI ID:1005034

Solvent accessible surface representation in a database system for protein docking
Technical Report · Sat Dec 30 23:00:00 EST 1995 · OSTI ID:401864

A spectral metric for collider geometry
Journal Article · Fri Aug 18 00:00:00 EDT 2023 · Journal of High Energy Physics (Online) · OSTI ID:2420206