SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- College of Engineering, University of California, Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA, College of Engineering, University of California, Berkeley, CA 94720, USA
Abstract The Structural Classification of Proteins—extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE; National Institutes of Health (NIH)
- Grant/Contract Number:
- AC02-05CH11231; R01-GM073109
- OSTI ID:
- 1833446
- Alternate ID(s):
- OSTI ID: 1856227
- Journal Information:
- Nucleic Acids Research, Journal Name: Nucleic Acids Research Vol. 50 Journal Issue: D1; ISSN 0305-1048
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Similar Records
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
SCOR: Structural classification of RNA, Version 2.0