skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein Classification Based on Analysis of Local Sequence-Structure Correspondence

Abstract

The goal of this project was to develop an algorithm to detect and calculate common structural motifs in compared structures, and define a set of numerical criteria to be used for fully automated motif based protein structure classification. The Protein Data Bank (PDB) contains more than 33,000 experimentally solved protein structures, and the Structural Classification of Proteins (SCOP) database, a manual classification of these structures, cannot keep pace with the rapid growth of the PDB. In our approach called STRALCP (STRucture Alignment based Clustering of Proteins), we generate detailed information about global and local similarities between given set of structures, identify similar fragments that are conserved within analyzed proteins, and use these conserved regions (detected structural motifs) to classify proteins.

Authors:
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
893991
Report Number(s):
UCRL-TR-218946
TRN: US200701%%92
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; ALIGNMENT; CLASSIFICATION; PROTEIN STRUCTURE; PROTEINS

Citation Formats

Zemla, A T. Protein Classification Based on Analysis of Local Sequence-Structure Correspondence. United States: N. p., 2006. Web. doi:10.2172/893991.
Zemla, A T. Protein Classification Based on Analysis of Local Sequence-Structure Correspondence. United States. doi:10.2172/893991.
Zemla, A T. Mon . "Protein Classification Based on Analysis of Local Sequence-Structure Correspondence". United States. doi:10.2172/893991. https://www.osti.gov/servlets/purl/893991.
@article{osti_893991,
title = {Protein Classification Based on Analysis of Local Sequence-Structure Correspondence},
author = {Zemla, A T},
abstractNote = {The goal of this project was to develop an algorithm to detect and calculate common structural motifs in compared structures, and define a set of numerical criteria to be used for fully automated motif based protein structure classification. The Protein Data Bank (PDB) contains more than 33,000 experimentally solved protein structures, and the Structural Classification of Proteins (SCOP) database, a manual classification of these structures, cannot keep pace with the rapid growth of the PDB. In our approach called STRALCP (STRucture Alignment based Clustering of Proteins), we generate detailed information about global and local similarities between given set of structures, identify similar fragments that are conserved within analyzed proteins, and use these conserved regions (detected structural motifs) to classify proteins.},
doi = {10.2172/893991},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Feb 13 00:00:00 EST 2006},
month = {Mon Feb 13 00:00:00 EST 2006}
}

Technical Report:

Save / Share:
  • Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured usingmore » statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.« less
  • An automated procedure for protein design by optimization of a sequence-structure quality has been developed. The method selects a statistically optimal sequence for a particular structure, on the assumption that such a protein will adopt the desired structure. We present two optimization algorithms: one provides an exact optimization while the other uses a combinatorial technique for comparatively rapid results. Both are suitable for massively parallel computers. A prototype system was used to design sequences which should adopt the four-helix bundle conformation of myohemerythrin. These appear satisfactory to secondary structure and profile analysis. Detailed inspection reveals that the sequences are generallymore » plausible but, as expected, lack some specific structural features. The design parameters provide some insight into the general determinants of protein structure.« less
  • A method of quantitative comparison of two classifications rules applied to protein folding problem is presented. Classification of proteins based on sequence homology and based on amino acid composition were compared and analyzed according to this approach. The coefficient of correlation between these classification methods and the procedure of estimation of robustness of the coefficient are discussed.