Multi-scale structural analysis of proteins by deep semantic segmentation
Abstract
Abstract Motivation Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation—a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. Results We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guidingmore »
- Authors:
-
- Department of Biochemistry, School of Medicine, Stanford University, Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
- Publication Date:
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1604834
- Resource Type:
- Published Article
- Journal Name:
- Bioinformatics
- Additional Journal Information:
- Journal Name: Bioinformatics Journal Volume: 36 Journal Issue: 6; Journal ID: ISSN 1367-4803
- Publisher:
- Oxford University Press
- Country of Publication:
- United Kingdom
- Language:
- English
Citation Formats
Eguchi, Raphael R., Huang, Po-Ssu, and Valencia, ed., Alfonso. Multi-scale structural analysis of proteins by deep semantic segmentation. United Kingdom: N. p., 2019.
Web. doi:10.1093/bioinformatics/btz650.
Eguchi, Raphael R., Huang, Po-Ssu, & Valencia, ed., Alfonso. Multi-scale structural analysis of proteins by deep semantic segmentation. United Kingdom. https://doi.org/10.1093/bioinformatics/btz650
Eguchi, Raphael R., Huang, Po-Ssu, and Valencia, ed., Alfonso. Mon .
"Multi-scale structural analysis of proteins by deep semantic segmentation". United Kingdom. https://doi.org/10.1093/bioinformatics/btz650.
@article{osti_1604834,
title = {Multi-scale structural analysis of proteins by deep semantic segmentation},
author = {Eguchi, Raphael R. and Huang, Po-Ssu and Valencia, ed., Alfonso},
abstractNote = {Abstract Motivation Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation—a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. Results We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design. Availability and implementation The trained classifier network, parser network, and entropy calculation scripts are available for download at https://git.io/fp6bd, with detailed usage instructions provided at the download page. A step-by-step tutorial for setup is provided at https://goo.gl/e8GB2S. All Rosetta commands, RosettaRemodel blueprints, and predictions for all datasets used in the study are available in the Supplementary Information. Supplementary information Supplementary data are available at Bioinformatics online.},
doi = {10.1093/bioinformatics/btz650},
journal = {Bioinformatics},
number = 6,
volume = 36,
place = {United Kingdom},
year = {2019},
month = {8}
}
https://doi.org/10.1093/bioinformatics/btz650
Works referenced in this record:
The Protein Folding Problem
journal, June 2008
- Dill, Ken A.; Ozkan, S. Banu; Shell, M. Scott
- Annual Review of Biophysics, Vol. 37, Issue 1
Efficient flexible backbone protein–protein docking for challenging targets
journal, April 2018
- Marze, Nicholas A.; Roy Burman, Shourya S.; Sheffler, William
- Bioinformatics, Vol. 34, Issue 20
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
journal, January 2017
- Wang, Sheng; Sun, Siqi; Li, Zhen
- PLOS Computational Biology, Vol. 13, Issue 1
CATH: an expanded resource to predict protein function through structure and sequence
journal, November 2016
- Dawson, Natalie L.; Lewis, Tony E.; Das, Sayoni
- Nucleic Acids Research, Vol. 45, Issue D1
Kemp elimination catalysts by computational enzyme design
journal, March 2008
- Röthlisberger, Daniela; Khersonsky, Olga; Wollacott, Andrew M.
- Nature, Vol. 453, Issue 7192, p. 190-195
Exploring the repeat protein universe through computational protein design
journal, December 2015
- Brunette, Tj; Parmeggiani, Fabio; Huang, Po-Ssu
- Nature, Vol. 528, Issue 7583
Principles for designing ideal protein structures
journal, November 2012
- Koga, Nobuyasu; Tatsumi-Koga, Rie; Liu, Gaohua
- Nature, Vol. 491, Issue 7423
Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy
journal, May 2012
- King, N. P.; Sheffler, W.; Sawaya, M. R.
- Science, Vol. 336, Issue 6085
Principles for designing proteins with cavities formed by curved β sheets
journal, January 2017
- Marcos, Enrique; Basanta, Benjamin; Chidyausiku, Tamuka M.
- Science, Vol. 355, Issue 6321
The coming of age of de novo protein design
journal, September 2016
- Huang, Po-Ssu; Boyken, Scott E.; Baker, David
- Nature, Vol. 537, Issue 7620
De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy
journal, November 2015
- Huang, Po-Ssu; Feldmeier, Kaspar; Parmeggiani, Fabio
- Nature Chemical Biology, Vol. 12, Issue 1
Tertiary alphabet for the observable protein structural universe
journal, November 2016
- Mackenzie, Craig O.; Zhou, Jianfu; Grigoryan, Gevorg
- Proceedings of the National Academy of Sciences, Vol. 113, Issue 47
DeepSF: deep convolutional neural network for mapping protein sequences to folds
journal, December 2017
- Hou, Jie; Adhikari, Badri; Cheng, Jianlin
- Bioinformatics, Vol. 34, Issue 8
De novo design of a non-local β-sheet protein with high stability and accuracy
journal, October 2018
- Marcos, Enrique; Chidyausiku, Tamuka M.; McShan, Andrew C.
- Nature Structural & Molecular Biology, Vol. 25, Issue 11
Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin
journal, May 2011
- Fleishman, S. J.; Whitehead, T. A.; Ekiert, D. C.
- Science, Vol. 332, Issue 6031
Toward High-Resolution de Novo Structure Prediction for Small Proteins
journal, September 2005
- Bradley, P.
- Science, Vol. 309, Issue 5742
How Protein Stability and New Functions Trade Off
journal, February 2008
- Tokuriki, Nobuhiko; Stricher, Francois; Serrano, Luis
- PLoS Computational Biology, Vol. 4, Issue 2
3D deep convolutional neural networks for amino acid environment similarity analysis
journal, June 2017
- Torng, Wen; Altman, Russ B.
- BMC Bioinformatics, Vol. 18, Issue 1
De Novo Computational Design of Retro-Aldol Enzymes
journal, March 2008
- Jiang, L.; Althoff, E. A.; Clemente, F. R.
- Science, Vol. 319, Issue 5868
A Potent and Broad Neutralizing Antibody Recognizes and Penetrates the HIV Glycan Shield
journal, October 2011
- Pejchal, R.; Doores, K. J.; Walker, L. M.
- Science, Vol. 334, Issue 6059
Protein stability promotes evolvability
journal, March 2006
- Bloom, J. D.; Labthavikul, S. T.; Otey, C. R.
- Proceedings of the National Academy of Sciences, Vol. 103, Issue 15, p. 5869-5874
Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information
journal, May 2014
- Ovchinnikov, Sergey; Kamisetty, Hetunandan; Baker, David
- eLife, Vol. 3
Computational design of ligand-binding proteins with high affinity and selectivity
journal, September 2013
- Tinberg, Christine E.; Khare, Sagar D.; Dou, Jiayi
- Nature, Vol. 501, Issue 7466, p. 212-216
Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home
journal, January 2007
- Das, Rhiju; Qian, Bin; Raman, Srivatsan
- Proteins: Structure, Function, and Bioinformatics, Vol. 69, Issue S8
Global analysis of protein folding using massively parallel design, synthesis, and testing
journal, July 2017
- Rocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna
- Science, Vol. 357, Issue 6347
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
journal, December 2013
- Fox, Naomi K.; Brenner, Steven E.; Chandonia, John-Marc
- Nucleic Acids Research, Vol. 42, Issue D1
RosettaRemodel: A Generalized Framework for Flexible Backbone Protein Design
journal, August 2011
- Huang, Po-Ssu; Ban, Yih-En Andrew; Richter, Florian
- PLoS ONE, Vol. 6, Issue 8
Protein homology model refinement by large-scale energy optimization
journal, March 2018
- Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E.
- Proceedings of the National Academy of Sciences, Vol. 115, Issue 12
Critical assessment of methods of protein structure prediction (CASP)-Round XII
journal, December 2017
- Moult, John; Fidelis, Krzysztof; Kryshtafovych, Andriy
- Proteins: Structure, Function, and Bioinformatics, Vol. 86
Design of self-assembling transmembrane helical bundles to elucidate principles required for membrane protein folding and ion transport
journal, June 2017
- Joh, Nathan H.; Grigoryan, Gevorg; Wu, Yibing
- Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 372, Issue 1726
Principles that Govern the Folding of Protein Chains
journal, July 1973
- Anfinsen, C. B.
- Science, Vol. 181, Issue 4096
Engineering an Artificial Flavoprotein Magnetosensor
journal, December 2016
- Bialas, Chris; Jarocha, Lauren E.; Henbest, Kevin B.
- Journal of the American Chemical Society, Vol. 138, Issue 51
De Novo Design and Experimental Characterization of Ultrashort Self-Associating Peptides
journal, July 2014
- Smadbeck, James; Chan, Kiat Hwa; Khoury, George A.
- PLoS Computational Biology, Vol. 10, Issue 7
De novo design of a fluorescence-activating β-barrel
journal, September 2018
- Dou, Jiayi; Vorobieva, Anastassia A.; Sheffler, William
- Nature, Vol. 561, Issue 7724