skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning

Journal Article · · Nucleic Acids Research
DOI:https://doi.org/10.1093/nar/gkab1054· OSTI ID:1833446
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3]; ORCiD logo [2]; ORCiD logo [4]; ORCiD logo [5]
  1. Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
  2. Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
  3. College of Engineering, University of California, Berkeley, CA 94720, USA
  4. Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
  5. Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA, College of Engineering, University of California, Berkeley, CA 94720, USA

Abstract The Structural Classification of Proteins—extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE; National Institutes of Health (NIH)
Grant/Contract Number:
AC02-05CH11231; R01-GM073109
OSTI ID:
1833446
Alternate ID(s):
OSTI ID: 1856227
Journal Information:
Nucleic Acids Research, Journal Name: Nucleic Acids Research Vol. 50 Journal Issue: D1; ISSN 0305-1048
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (27)

SCOP: A structural classification of proteins database for the investigation of sequences and structures journal April 1995
SCOP database in 2004: refinements integrate structure and sequence family data journal January 2004
The ASTRAL Compendium in 2004 journal January 2004
The value of protein structure classification information-Surveying the scientific literature: The Value of Protein Structure Classification journal September 2015
UniProt: the universal protein knowledgebase in 2021 journal November 2020
SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins – extended Database journal February 2017
Accurate prediction of protein structures and interactions using a three-track neural network journal July 2021
The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures journal November 2019
The FAIR Guiding Principles for scientific data management and stewardship journal March 2016
ECOD: An Evolutionary Classification of Protein Domains journal December 2014
The ASTRAL compendium for protein structure and sequence analysis journal January 2000
Structural patterns in globular proteins journal June 1976
3DSwap: curated knowledgebase of proteins involved in 3D domain swapping journal January 2011
Highly accurate protein structure prediction with AlphaFold journal July 2021
Data growth and its impact on the SCOP database: new developments journal December 2007
SCOP database in 2002: refinements accommodate structural genomics journal January 2002
A novel human autoimmune syndrome caused by combined hypomorphic and activating mutations in ZAP-70 journal January 2016
RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences journal November 2020
ASTRAL compendium enhancements journal January 2002
The Protein Data Bank journal January 2000
CATH: increased structural coverage of functional space journal November 2020
The Ensembl Variant Effect Predictor journal June 2016
SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins journal November 2018
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures journal December 2013
SCOPe: classification of large macromolecular structures in the structural classification of proteins—extended database journal November 2018
Pfam: The protein families database in 2021 journal October 2020
Understanding the molecular machinery of genetics through 3D structures journal February 2008