skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-scale structural analysis of proteins by deep semantic segmentation

Journal Article · · Bioinformatics

Abstract Motivation Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation—a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. Results We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design. Availability and implementation The trained classifier network, parser network, and entropy calculation scripts are available for download at https://git.io/fp6bd, with detailed usage instructions provided at the download page. A step-by-step tutorial for setup is provided at https://goo.gl/e8GB2S. All Rosetta commands, RosettaRemodel blueprints, and predictions for all datasets used in the study are available in the Supplementary Information. Supplementary information Supplementary data are available at Bioinformatics online.

Sponsoring Organization:
USDOE
OSTI ID:
1604834
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 36 Journal Issue: 6; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English
Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

References (34)

The Protein Folding Problem journal June 2008
Efficient flexible backbone protein–protein docking for challenging targets journal April 2018
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model journal January 2017
CATH: an expanded resource to predict protein function through structure and sequence journal November 2016
Kemp elimination catalysts by computational enzyme design journal March 2008
Exploring the repeat protein universe through computational protein design journal December 2015
Principles for designing ideal protein structures journal November 2012
Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy journal May 2012
Principles for designing proteins with cavities formed by curved β sheets journal January 2017
The coming of age of de novo protein design journal September 2016
De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy journal November 2015
Tertiary alphabet for the observable protein structural universe journal November 2016
DeepSF: deep convolutional neural network for mapping protein sequences to folds journal December 2017
De novo design of a non-local β-sheet protein with high stability and accuracy journal October 2018
Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin journal May 2011
Toward High-Resolution de Novo Structure Prediction for Small Proteins journal September 2005
How Protein Stability and New Functions Trade Off journal February 2008
3D deep convolutional neural networks for amino acid environment similarity analysis journal June 2017
De Novo Computational Design of Retro-Aldol Enzymes journal March 2008
A Potent and Broad Neutralizing Antibody Recognizes and Penetrates the HIV Glycan Shield journal October 2011
Protein stability promotes evolvability journal March 2006
Computational design of ligand-binding proteins with high affinity and selectivity journal September 2013
Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home journal January 2007
Processing and analysis of CASP3 protein structure predictions journal January 1999
Global analysis of protein folding using massively parallel design, synthesis, and testing journal July 2017
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures journal December 2013
RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design journal August 2011
Protein homology model refinement by large-scale energy optimization journal March 2018
Critical assessment of methods of protein structure prediction (CASP)-Round XII journal December 2017
Design of self-assembling transmembrane helical bundles to elucidate principles required for membrane protein folding and ion transport journal June 2017
Principles that Govern the Folding of Protein Chains journal July 1973
Engineering an Artificial Flavoprotein Magnetosensor journal December 2016
De Novo Design and Experimental Characterization of Ultrashort Self-Associating Peptides journal July 2014
De novo design of a fluorescence-activating β-barrel journal September 2018