skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The value of protein structure classification information-Surveying the scientific literature

Abstract

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority usedmore » data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less

Authors:
 [1];  [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
National Institutes of Health (NIH); USDOE
OSTI Identifier:
1378622
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Proteins
Additional Journal Information:
Journal Volume: 83; Journal Issue: 11; Journal ID: ISSN 0887-3585
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; SCOP; CATH; database; curation; resources

Citation Formats

Fox, Naomi K., Brenner, Steven E., and Chandonia, John -Marc. The value of protein structure classification information-Surveying the scientific literature. United States: N. p., 2015. Web. https://doi.org/10.1002/prot.24915.
Fox, Naomi K., Brenner, Steven E., & Chandonia, John -Marc. The value of protein structure classification information-Surveying the scientific literature. United States. https://doi.org/10.1002/prot.24915
Fox, Naomi K., Brenner, Steven E., and Chandonia, John -Marc. Thu . "The value of protein structure classification information-Surveying the scientific literature". United States. https://doi.org/10.1002/prot.24915. https://www.osti.gov/servlets/purl/1378622.
@article{osti_1378622,
title = {The value of protein structure classification information-Surveying the scientific literature},
author = {Fox, Naomi K. and Brenner, Steven E. and Chandonia, John -Marc},
abstractNote = {The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.},
doi = {10.1002/prot.24915},
journal = {Proteins},
number = 11,
volume = 83,
place = {United States},
year = {2015},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
journal, January 2007

  • Greene, L. H.; Lewis, T. E.; Addou, S.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl959

SCOP database in 2004: refinements integrate structure and sequence family data
journal, January 2004


On the Universe of Protein Folds
journal, May 2013


The Pfam Protein Families Database
journal, January 2002


The CATH database: an extended protein family resource for structural and functional genomics
journal, January 2003


Rebelling for a Reason: Protein Structural “Outliers”
journal, September 2013


The ASTRAL Compendium in 2004
journal, January 2004


Protein structure database search and evolutionary classification
journal, July 2006


ThreaDom: extracting protein domain boundary information from multiple threading alignments
journal, June 2013


Assessing the accuracy of template-based structure prediction metaservers by comparison with structural genomics structures
journal, October 2012

  • Gront, Dominik; Grabowski, Marek; Zimmerman, Matthew D.
  • Journal of Structural and Functional Genomics, Vol. 13, Issue 4
  • DOI: 10.1007/s10969-012-9146-2

Protein homology detection by HMM-HMM comparison
journal, November 2004


Partitioning Protein Structures into Domains: Why Is it so Difficult?
journal, August 2006

  • Holland, Timothy A.; Veretnik, Stella; Shindyalov, Ilya N.
  • Journal of Molecular Biology, Vol. 361, Issue 3
  • DOI: 10.1016/j.jmb.2006.05.060

Protein-protein docking benchmark version 4.0: Protein-Protein Docking Benchmark Version 4.0
journal, July 2010

  • Hwang, Howook; Vreven, Thom; Janin, Joël
  • Proteins: Structure, Function, and Bioinformatics, Vol. 78, Issue 15
  • DOI: 10.1002/prot.22830

The CATH Database provides insights into protein structure/function relationships
journal, January 1999

  • Orengo, C. A.; Pearl, F. M. G.; Bray, J. E.
  • Nucleic Acids Research, Vol. 27, Issue 1
  • DOI: 10.1093/nar/27.1.275

CATH – a hierarchic classification of protein domain structures
journal, August 1997


Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison
journal, December 2002


The ASTRAL compendium for protein structure and sequence analysis
journal, January 2000


Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions
journal, November 2004

  • Krissinel, E.; Henrick, K.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 60, Issue 12
  • DOI: 10.1107/S0907444904026460

Searching protein structure databases with DaliLite v.3
journal, September 2008


FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties11Edited by B. Honig
journal, June 2001

  • Shi, Jiye; Blundell, Tom L.; Mizuguchi, Kenji
  • Journal of Molecular Biology, Vol. 310, Issue 1
  • DOI: 10.1006/jmbi.2001.4762

SCOP: a Structural Classification of Proteins database
journal, January 2000


Capturing protein sequence-structure specificity using computational sequence design: Fold Recognition Using Designed Sequences
journal, June 2013

  • Mach, Paul; Koehl, Patrice
  • Proteins: Structure, Function, and Bioinformatics, Vol. 81, Issue 9
  • DOI: 10.1002/prot.24307

SCOP2 prototype: a new approach to protein structure mining
journal, November 2013

  • Andreeva, Antonina; Howorth, Dave; Chothia, Cyrus
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1242

UCSF Chimera?A visualization system for exploratory research and analysis
journal, January 2004

  • Pettersen, Eric F.; Goddard, Thomas D.; Huang, Conrad C.
  • Journal of Computational Chemistry, Vol. 25, Issue 13
  • DOI: 10.1002/jcc.20084

SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013

  • Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1240

The four-transmembrane protein IP39 of Euglena forms strands by a trimeric unit repeat
journal, April 2013

  • Suzuki, Hiroshi; Ito, Yasuyuki; Yamazaki, Yuji
  • Nature Communications, Vol. 4, Issue 1
  • DOI: 10.1038/ncomms2731

Exploring the Evolution of Novel Enzyme Functions within Structurally Defined Protein Superfamilies
journal, March 2012


Evolution of oligomeric state through geometric coupling of protein interfaces
journal, May 2012

  • Perica, T.; Chothia, C.; Teichmann, S. A.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 21
  • DOI: 10.1073/pnas.1120028109

The HHpred interactive server for protein homology detection and structure prediction
journal, July 2005

  • Soding, J.; Biegert, A.; Lupas, A. N.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki408

Pfam: A comprehensive database of protein domain families based on seed alignments
journal, July 1997


In vivo translation rates can substantially delay the cotranslational folding of the Escherichia coli cytosolic proteome
journal, December 2012

  • Ciryam, P.; Morimoto, R. I.; Vendruscolo, M.
  • Proceedings of the National Academy of Sciences, Vol. 110, Issue 2
  • DOI: 10.1073/pnas.1213624110

Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
journal, September 1998

  • Shindyalov, I. N.; Bourne, P. E.
  • Protein Engineering Design and Selection, Vol. 11, Issue 9
  • DOI: 10.1093/protein/11.9.739

PASS2 version 4: An update to the database of structure-based sequence alignments of structural domain superfamilies
journal, November 2011

  • Gandhimathi, A.; Nair, Anu G.; Sowdhamini, R.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1096

SCOP: A structural classification of proteins database for the investigation of sequences and structures
journal, April 1995


The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies
journal, January 2009

  • Cuff, A. L.; Sillitoe, I.; Lewis, T.
  • Nucleic Acids Research, Vol. 37, Issue Database
  • DOI: 10.1093/nar/gkn877

Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies
journal, November 2013


Classification of protein functional surfaces using structural characteristics
journal, January 2012

  • Tseng, Y. Y.; Li, W. -H.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 4
  • DOI: 10.1073/pnas.1119684109

The Jpred 3 secondary structure prediction server
journal, May 2008

  • Cole, C.; Barber, J. D.; Barton, G. J.
  • Nucleic Acids Research, Vol. 36, Issue Web Server
  • DOI: 10.1093/nar/gkn238

The Impact of Structural Genomics: Expectations and Outcomes
journal, January 2006


Accelerated Profile HMM Searches
journal, October 2011


Deconstruction of Activity-Dependent Covalent Modification of Heme in Human Neutrophil Myeloperoxidase by Multistage Mass Spectrometry (MS 4 )
journal, February 2012

  • Geoghegan, Kieran F.; Varghese, Alison H.; Feng, Xidong
  • Biochemistry, Vol. 51, Issue 10
  • DOI: 10.1021/bi201872j

SCOP: a Structural Classification of Proteins database
journal, January 1997

  • Hubbard, T. J. P.; Murzin, A. G.; Brenner, S. E.
  • Nucleic Acids Research, Vol. 25, Issue 1
  • DOI: 10.1093/nar/25.1.236

Origin and Evolution of Protein Fold Designs Inferred from Phylogenomic Analysis of CATH Domain Structures in Proteomes
journal, March 2013


SCOP: a Structural Classification of Proteins database
journal, January 1999

  • Hubbard, T. J. P.; Ailey, B.; Brenner, S. E.
  • Nucleic Acids Research, Vol. 27, Issue 1
  • DOI: 10.1093/nar/27.1.254

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
journal, November 2001

  • Gough, Julian; Karplus, Kevin; Hughey, Richard
  • Journal of Molecular Biology, Vol. 313, Issue 4
  • DOI: 10.1006/jmbi.2001.5080

Enhanced genome annotation using structural profiles in the program 3D-PSSM 1 1Edited by J. Thornton
journal, June 2000

  • Kelley, Lawrence A.; MacCallum, Robert M.; Sternberg, Michael J. E.
  • Journal of Molecular Biology, Vol. 299, Issue 2
  • DOI: 10.1006/jmbi.2000.3741

New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures
journal, November 2012

  • Sillitoe, Ian; Cuff, Alison L.; Dessailly, Benoit H.
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1211

SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data
journal, November 1998

  • Hubbard, Tim J. P.; Ailey, Bart; Brenner, Steven E.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 54, Issue 6
  • DOI: 10.1107/S0907444998009172

Extending CATH: increasing coverage of the protein structure universe and linking structure with function
journal, November 2010

  • Cuff, A. L.; Sillitoe, I.; Lewis, T.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1001

Structure of Staphylococcal alpha -Hemolysin, a Heptameric Transmembrane Pore
journal, December 1996


Small-angle X-ray scattering constraints and local geometry like secondary structures can construct a coarse-grained protein model at amino acid residue resolution
journal, February 2013

  • Morimoto, Yasumasa; Nakagawa, Takashi; Kojima, Masaki
  • Biochemical and Biophysical Research Communications, Vol. 431, Issue 1
  • DOI: 10.1016/j.bbrc.2012.12.091

Fast large-scale clustering of protein structures using Gauss integrals
journal, December 2011


Bacterial GRAS domain proteins throw new light on gibberellic acid response mechanisms
journal, July 2012


SCOP database in 2002: refinements accommodate structural genomics
journal, January 2002


Structural and functional analysis of the archaeal endonuclease Nob1
journal, December 2011

  • Veith, Thomas; Martin, Roman; Wurm, Jan P.
  • Nucleic Acids Research, Vol. 40, Issue 7
  • DOI: 10.1093/nar/gkr1186

A holistic in silico approach to predict functional sites in protein structures
journal, May 2012


PSCDB: a database for protein structural change upon ligand binding
journal, November 2011

  • Amemiya, T.; Koike, R.; Kidera, A.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr966

Identification of Domains in Protein Structures from the Analysis of Intramolecular Interactions
journal, March 2012

  • Genoni, Alessandro; Morra, Giulia; Colombo, Giorgio
  • The Journal of Physical Chemistry B, Vol. 116, Issue 10
  • DOI: 10.1021/jp210568a

Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches
journal, November 2004

  • Chandonia, John-Marc; Brenner, Steven E.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 58, Issue 1
  • DOI: 10.1002/prot.20298

ASTRAL compendium enhancements
journal, January 2002


Protein structure prediction on the Web: a case study using the Phyre server
journal, February 2009

  • Kelley, Lawrence A.; Sternberg, Michael J. E.
  • Nature Protocols, Vol. 4, Issue 3
  • DOI: 10.1038/nprot.2009.2

Three-dimensional domain swapping in the protein structure space: Three-Dimensional Domain Swapping
journal, March 2012

  • Huang, Yongqi; Cao, Huaiqing; Liu, Zhirong
  • Proteins: Structure, Function, and Bioinformatics, Vol. 80, Issue 6
  • DOI: 10.1002/prot.24055

The Protein Data Bank
journal, January 2000


MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004

  • Edgar, R. C.
  • Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
  • DOI: 10.1093/nar/gkh340

High-quality protein backbone reconstruction from alpha carbons using Gaussian mixture models
journal, May 2013

  • Moore, Benjamin L.; Kelley, Lawrence A.; Barber, James
  • Journal of Computational Chemistry, Vol. 34, Issue 22
  • DOI: 10.1002/jcc.23330

Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model
journal, January 2001


CATH: comprehensive structural and functional annotations for genome sequences
journal, October 2014

  • Sillitoe, Ian; Lewis, Tony E.; Cuff, Alison
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku947

SGD: Saccharomyces Genome Database
journal, January 1998


The Protein Data Bank
journal, May 2002

  • Berman, Helen M.; Battistuz, Tammy; Bhat, T. N.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 58, Issue 6
  • DOI: 10.1107/S0907444902003451

Insights into the Fold Organization of TIM Barrel from Interaction Energy Based Structure Networks
journal, May 2012


Viral Capsid Proteins Are Segregated in Structural Fold Space
journal, February 2013


    Works referencing / citing this record:

    A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice
    journal, December 2018

    • Peyravi, Farzad; Latif, Alimohammad; Moshtaghioun, Seyed Mohammad
    • Bulletin of Mathematical Biology, Vol. 81, Issue 3
    • DOI: 10.1007/s11538-018-00542-4

    A comprehensive review and comparison of different computational methods for protein remote homology detection
    journal, November 2016

    • Chen, Junjie; Guo, Mingyue; Wang, Xiaolong
    • Briefings in Bioinformatics, Vol. 19, Issue 2
    • DOI: 10.1093/bib/bbw108

    SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database
    journal, November 2018

    • Chandonia, John-Marc; Fox, Naomi K.; Brenner, Steven E.
    • Nucleic Acids Research, Vol. 47, Issue D1
    • DOI: 10.1093/nar/gky1134

    Organic Particles: Heterogeneous Hubs for Microbial Interactions in Aquatic Ecosystems
    journal, October 2018

    • Bižić-Ionescu, Mina; Ionescu, Danny; Grossart, Hans-Peter
    • Frontiers in Microbiology, Vol. 9
    • DOI: 10.3389/fmicb.2018.02569

    Sequence and Structure Properties Uncover the Natural Classification of Protein Complexes Formed by Intrinsically Disordered Proteins via Mutual Synergistic Folding
    journal, November 2019

    • Mészáros, Bálint; Dobson, László; Fichó, Erzsébet
    • International Journal of Molecular Sciences, Vol. 20, Issue 21
    • DOI: 10.3390/ijms20215460