skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The value of protein structure classification information-Surveying the scientific literature

Journal Article · · Proteins
DOI:https://doi.org/10.1002/prot.24915· OSTI ID:1378622
 [1];  [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
National Institutes of Health (NIH); USDOE
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1378622
Journal Information:
Proteins, Vol. 83, Issue 11; ISSN 0887-3585
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 14 works
Citation information provided by
Web of Science

References (72)

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution journal January 2007
SCOP database in 2004: refinements integrate structure and sequence family data journal January 2004
On the Universe of Protein Folds journal May 2013
The Pfam Protein Families Database journal January 2002
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space journal February 2006
The CATH database: an extended protein family resource for structural and functional genomics journal January 2003
Rebelling for a Reason: Protein Structural “Outliers” journal September 2013
The ASTRAL Compendium in 2004 journal January 2004
Protein structure database search and evolutionary classification journal July 2006
ThreaDom: extracting protein domain boundary information from multiple threading alignments journal June 2013
Assessing the accuracy of template-based structure prediction metaservers by comparison with structural genomics structures journal October 2012
Protein homology detection by HMM-HMM comparison journal November 2004
Partitioning Protein Structures into Domains: Why Is it so Difficult? journal August 2006
Protein-protein docking benchmark version 4.0: Protein-Protein Docking Benchmark Version 4.0 journal July 2010
N-Terminal Domains in Two-Domain Proteins Are Biased to Be Shorter and Predicted to Fold Faster Than Their C-Terminal Counterparts journal April 2013
The CATH Database provides insights into protein structure/function relationships journal January 1999
CATH – a hierarchic classification of protein domain structures journal August 1997
Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison journal December 2002
The ASTRAL compendium for protein structure and sequence analysis journal January 2000
Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions journal November 2004
Searching protein structure databases with DaliLite v.3 journal September 2008
FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties11Edited by B. Honig journal June 2001
SCOP: a Structural Classification of Proteins database journal January 2000
Capturing protein sequence-structure specificity using computational sequence design: Fold Recognition Using Designed Sequences journal June 2013
SCOP2 prototype: a new approach to protein structure mining journal November 2013
UCSF Chimera?A visualization system for exploratory research and analysis journal January 2004
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures journal December 2013
The four-transmembrane protein IP39 of Euglena forms strands by a trimeric unit repeat journal April 2013
Exploring the Evolution of Novel Enzyme Functions within Structurally Defined Protein Superfamilies journal March 2012
Evolution of oligomeric state through geometric coupling of protein interfaces journal May 2012
The HHpred interactive server for protein homology detection and structure prediction journal July 2005
Pfam: A comprehensive database of protein domain families based on seed alignments journal July 1997
In vivo translation rates can substantially delay the cotranslational folding of the Escherichia coli cytosolic proteome journal December 2012
Protein structure alignment by incremental combinatorial extension (CE) of the optimal path journal September 1998
PASS2 version 4: An update to the database of structure-based sequence alignments of structural domain superfamilies journal November 2011
SCOP: A structural classification of proteins database for the investigation of sequences and structures journal April 1995
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies journal January 2009
Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies journal November 2013
Classification of protein functional surfaces using structural characteristics journal January 2012
The Jpred 3 secondary structure prediction server journal May 2008
The Impact of Structural Genomics: Expectations and Outcomes journal January 2006
Accelerated Profile HMM Searches journal October 2011
Deconstruction of Activity-Dependent Covalent Modification of Heme in Human Neutrophil Myeloperoxidase by Multistage Mass Spectrometry (MS 4 ) journal February 2012
SCOP: a Structural Classification of Proteins database journal January 1997
Origin and Evolution of Protein Fold Designs Inferred from Phylogenomic Analysis of CATH Domain Structures in Proteomes journal March 2013
SCOP: a Structural Classification of Proteins database journal January 1999
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure journal November 2001
Enhanced genome annotation using structural profiles in the program 3D-PSSM 1 1Edited by J. Thornton journal June 2000
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures journal November 2012
SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data journal November 1998
Extending CATH: increasing coverage of the protein structure universe and linking structure with function journal November 2010
Structure of Staphylococcal alpha -Hemolysin, a Heptameric Transmembrane Pore journal December 1996
Small-angle X-ray scattering constraints and local geometry like secondary structures can construct a coarse-grained protein model at amino acid residue resolution journal February 2013
Fast large-scale clustering of protein structures using Gauss integrals journal December 2011
Bacterial GRAS domain proteins throw new light on gibberellic acid response mechanisms journal July 2012
SCOP database in 2002: refinements accommodate structural genomics journal January 2002
Structural and functional analysis of the archaeal endonuclease Nob1 journal December 2011
A holistic in silico approach to predict functional sites in protein structures journal May 2012
PSCDB: a database for protein structural change upon ligand binding journal November 2011
Identification of Domains in Protein Structures from the Analysis of Intramolecular Interactions journal March 2012
Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches journal November 2004
ASTRAL compendium enhancements journal January 2002
Protein structure prediction on the Web: a case study using the Phyre server journal February 2009
Three-dimensional domain swapping in the protein structure space: Three-Dimensional Domain Swapping journal March 2012
The Protein Data Bank journal January 2000
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
High-quality protein backbone reconstruction from alpha carbons using Gaussian mixture models journal May 2013
Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model journal January 2001
CATH: comprehensive structural and functional annotations for genome sequences journal October 2014
SGD: Saccharomyces Genome Database journal January 1998
Insights into the Fold Organization of TIM Barrel from Interaction Energy Based Structure Networks journal May 2012
Viral Capsid Proteins Are Segregated in Structural Fold Space journal February 2013

Cited By (6)

A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice journal December 2018
A comprehensive review and comparison of different computational methods for protein remote homology detection journal November 2016
SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database journal November 2018
Organic Particles: Heterogeneous Hubs for Microbial Interactions in Aquatic Ecosystems journal October 2018
Sequence and Structure Properties Uncover the Natural Classification of Protein Complexes Formed by Intrinsically Disordered Proteins via Mutual Synergistic Folding journal November 2019
BoBER: web interface to the base of bioisosterically exchangeable replacements journal December 2017

Similar Records

Protein Classification Based on Analysis of Local Sequence-Structure Correspondence
Technical Report · Mon Feb 13 00:00:00 EST 2006 · OSTI ID:1378622

PROCOGNATE: a cognate ligand domain mapping for enzymes
Journal Article · Fri Aug 24 00:00:00 EDT 2007 · Nucleic Acids Research · OSTI ID:1378622

Data Mining Scientific Literature Demonstrates Use of Biological and Medical Data Across Scientific Disciplines
Journal Article · Mon Apr 01 00:00:00 EDT 2019 · FASEB Journal · OSTI ID:1378622