skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The value of protein structure classification information-Surveying the scientific literature

Abstract

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority usedmore » data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less

Authors:
 [1];  [2];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
National Institutes of Health (NIH); USDOE
OSTI Identifier:
1378622
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Proteins
Additional Journal Information:
Journal Volume: 83; Journal Issue: 11; Journal ID: ISSN 0887-3585
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; SCOP; CATH; database; curation; resources

Citation Formats

Fox, Naomi K., Brenner, Steven E., and Chandonia, John -Marc. The value of protein structure classification information-Surveying the scientific literature. United States: N. p., 2015. Web. doi:10.1002/prot.24915.
Fox, Naomi K., Brenner, Steven E., & Chandonia, John -Marc. The value of protein structure classification information-Surveying the scientific literature. United States. doi:10.1002/prot.24915.
Fox, Naomi K., Brenner, Steven E., and Chandonia, John -Marc. Thu . "The value of protein structure classification information-Surveying the scientific literature". United States. doi:10.1002/prot.24915. https://www.osti.gov/servlets/purl/1378622.
@article{osti_1378622,
title = {The value of protein structure classification information-Surveying the scientific literature},
author = {Fox, Naomi K. and Brenner, Steven E. and Chandonia, John -Marc},
abstractNote = {The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.},
doi = {10.1002/prot.24915},
journal = {Proteins},
number = 11,
volume = 83,
place = {United States},
year = {2015},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

SCOP database in 2004: refinements integrate structure and sequence family data
journal, January 2004


The Pfam Protein Families Database
journal, January 2002


ThreaDom: extracting protein domain boundary information from multiple threading alignments
journal, June 2013


Protein homology detection by HMM-HMM comparison
journal, November 2004


CATH – a hierarchic classification of protein domain structures
journal, August 1997


Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions
journal, November 2004

  • Krissinel, E.; Henrick, K.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 60, Issue 12
  • DOI: 10.1107/S0907444904026460

Searching protein structure databases with DaliLite v.3
journal, September 2008


FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties11Edited by B. Honig
journal, June 2001

  • Shi, Jiye; Blundell, Tom L.; Mizuguchi, Kenji
  • Journal of Molecular Biology, Vol. 310, Issue 1
  • DOI: 10.1006/jmbi.2001.4762

SCOP2 prototype: a new approach to protein structure mining
journal, November 2013

  • Andreeva, Antonina; Howorth, Dave; Chothia, Cyrus
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1242

UCSF Chimera?A visualization system for exploratory research and analysis
journal, January 2004

  • Pettersen, Eric F.; Goddard, Thomas D.; Huang, Conrad C.
  • Journal of Computational Chemistry, Vol. 25, Issue 13
  • DOI: 10.1002/jcc.20084

SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013

  • Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1240

Evolution of oligomeric state through geometric coupling of protein interfaces
journal, May 2012

  • Perica, T.; Chothia, C.; Teichmann, S. A.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 21
  • DOI: 10.1073/pnas.1120028109

The HHpred interactive server for protein homology detection and structure prediction
journal, July 2005

  • Soding, J.; Biegert, A.; Lupas, A. N.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki408

Pfam: A comprehensive database of protein domain families based on seed alignments
journal, July 1997


In vivo translation rates can substantially delay the cotranslational folding of the Escherichia coli cytosolic proteome
journal, December 2012

  • Ciryam, P.; Morimoto, R. I.; Vendruscolo, M.
  • Proceedings of the National Academy of Sciences, Vol. 110, Issue 2
  • DOI: 10.1073/pnas.1213624110

Protein structure alignment by incremental combinatorial extension (CE) of the optimal path
journal, September 1998

  • Shindyalov, I. N.; Bourne, P. E.
  • Protein Engineering Design and Selection, Vol. 11, Issue 9
  • DOI: 10.1093/protein/11.9.739

SCOP: A structural classification of proteins database for the investigation of sequences and structures
journal, April 1995


The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies
journal, January 2009

  • Cuff, A. L.; Sillitoe, I.; Lewis, T.
  • Nucleic Acids Research, Vol. 37, Issue Database
  • DOI: 10.1093/nar/gkn877

The Jpred 3 secondary structure prediction server
journal, May 2008

  • Cole, C.; Barber, J. D.; Barton, G. J.
  • Nucleic Acids Research, Vol. 36, Issue Web Server
  • DOI: 10.1093/nar/gkn238

The Impact of Structural Genomics: Expectations and Outcomes
journal, January 2006


Accelerated Profile HMM Searches
journal, October 2011


Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
journal, November 2001

  • Gough, Julian; Karplus, Kevin; Hughey, Richard
  • Journal of Molecular Biology, Vol. 313, Issue 4
  • DOI: 10.1006/jmbi.2001.5080

Extending CATH: increasing coverage of the protein structure universe and linking structure with function
journal, November 2010

  • Cuff, A. L.; Sillitoe, I.; Lewis, T.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1001

Structure of Staphylococcal alpha -Hemolysin, a Heptameric Transmembrane Pore
journal, December 1996


Fast large-scale clustering of protein structures using Gauss integrals
journal, December 2011


Protein structure prediction on the Web: a case study using the Phyre server
journal, February 2009

  • Kelley, Lawrence A.; Sternberg, Michael J. E.
  • Nature Protocols, Vol. 4, Issue 3
  • DOI: 10.1038/nprot.2009.2

The Protein Data Bank
journal, January 2000


MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004

  • Edgar, R. C.
  • Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
  • DOI: 10.1093/nar/gkh340

Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model
journal, January 2001


CATH: comprehensive structural and functional annotations for genome sequences
journal, October 2014

  • Sillitoe, Ian; Lewis, Tony E.; Cuff, Alison
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku947

SGD: Saccharomyces Genome Database
journal, January 1998


The Protein Data Bank
journal, May 2002

  • Berman, Helen M.; Battistuz, Tammy; Bhat, T. N.
  • Acta Crystallographica Section D Biological Crystallography, Vol. 58, Issue 6
  • DOI: 10.1107/S0907444902003451