Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A General Framework to Learn Tertiary Structure for Protein Sequence Characterization

Journal Article · · Frontiers in Bioinformatics
 [1];  [2]
  1. Georgia Institute of Technology, Atlanta, GA (United States); Georgia Institute of Technology
  2. Georgia Institute of Technology, Atlanta, GA (United States)

During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the intra-sequence distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.

Research Organization:
Georgia Institute of Technology, Atlanta, GA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Institutes of Health (NIH)
Grant/Contract Number:
SC0021303
OSTI ID:
1831097
Journal Information:
Frontiers in Bioinformatics, Journal Name: Frontiers in Bioinformatics Vol. 1; ISSN 2673-7647
Publisher:
Frontiers Media S.A.Copyright Statement
Country of Publication:
United States
Language:
English

References (34)

Proteopedia: Rossmann fold: A beta-alpha-beta fold at dinucleotide binding sites: Rossmann Fold in FAD, NAD and NADP Binding Domains journal February 2015
On the Role of Physics and Evolution in Dictating Protein Structure and Function journal July 2014
The relation between the divergence of sequence and structure in proteins. journal April 1986
Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm journal April 2004
Scoring function for automated assessment of protein structure template quality journal January 2004
SPARKS 2 and SP3 servers in CASP6 journal January 2005
Fast and accurate automatic structure prediction with HHpred journal January 2009
Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age journal November 2017
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 journal April 2019
A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments journal August 2019
Deep‐learning contact‐map guided protein structure prediction in CASP13 journal August 2019
GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences journal April 1999
Identification of common molecular subsequences journal March 1981
Progress and challenges in protein structure prediction journal June 2008
Protein sequence comparison and fold recognition: progress and good-practice benchmarking journal June 2011
The role of local versus nonlocal physicochemical restraints in determining protein native structure journal June 2021
How far divergent evolution goes in proteins journal June 1998
The Hemophore HasA from Yersinia pestis (HasA yp ) Coordinates Hemin with a Single Residue, Tyr75, and with Minimal Conformational Change journal April 2013
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment journal December 2011
Improved protein structure prediction using potentials from deep learning journal January 2020
DESTINI: A deep-learning approach to contact-driven protein structure prediction journal March 2019
Distance-based protein folding powered by deep learning journal August 2019
Improved protein structure prediction using predicted interresidue orientations journal January 2020
Allosteric Mechanism of Pyruvate Kinase from Leishmania mexicana Uses a Rock and Lock Model journal February 2010
A novel sequence alignment algorithm based on deep learning of the protein folding code journal September 2020
PISCES: a protein sequence culling server journal August 2003
Protein homology detection by HMM-HMM comparison journal November 2004
How significant is a protein structure similarity with TM-score = 0.5? journal February 2010
APoc: large-scale identification of similar protein pockets journal January 2013
The Protein Data Bank journal January 2000
TM-align: a protein structure alignment algorithm based on the TM-score journal April 2005
SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures journal December 2013
A Threading-Based Method for the Prediction of DNA-Binding Proteins with Application to the Human Genome journal November 2009
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model journal January 2017