DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ECNet is an evolutionary context-integrated deep learning framework for protein engineering

Journal Article · · Nature Communications

Abstract Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.

Sponsoring Organization:
USDOE
Grant/Contract Number:
NONE; SC0018420
OSTI ID:
1823200
Journal Information:
Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 12; ISSN 2041-1723
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (59)

osprey book January 2013
Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding journal September 2020
Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data journal January 2018
Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks journal January 2018
Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning journal January 2021
Informed training set design enables efficient machine learning-assisted directed protein evolution journal August 2021
A Comprehensive Biophysical Description of Pairwise Epistasis throughout an Entire Protein Domain journal November 2014
Pervasive Pairwise Intragenic Epistasis among Sequential Mutations in TEM-1 β-Lactamase journal May 2019
Molecular mechanisms of epistasis within and between genes journal August 2011
TLmutation: Predicting the Effects of Mutations Using Transfer Learning journal April 2020
Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins journal August 2018
Design by Directed Evolution journal March 1998
Epistasis as the primary factor in molecular evolution journal October 2012
Local fitness landscape of the green fluorescent protein journal May 2016
Mutation effects predicted from sequence co-variation journal January 2017
Molecular evolution by staggered extension process (StEP) in vitro recombination journal March 1998
A general framework for estimating the relative pathogenicity of human genetic variants journal February 2014
A method and server for predicting damaging missense mutations journal April 2010
Exploring protein fitness landscapes by directed evolution journal December 2009
The mutational landscape of a prion-like domain journal September 2019
Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape journal March 2020
Protein design and variant prediction using autoregressive generative models journal April 2021
Determining protein structures using deep mutagenesis journal June 2019
Inferring protein 3D structure from deep mutation scans journal June 2019
Deep generative models of genetic variation capture the effects of mutations journal September 2018
Machine-learning-guided directed evolution for protein engineering journal July 2019
Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics journal October 2019
Unified rational protein engineering with sequence-based deep representation learning journal October 2019
Low-N protein engineering with data-efficient deep learning journal April 2021
Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins journal May 2019
A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function journal October 2012
Navigating the protein fitness landscape with Gaussian processes journal December 2012
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era journal September 2013
Machine learning-assisted directed protein evolution with combinatorial libraries journal April 2019
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences journal April 2021
CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations journal July 2014
Exploiting ontology graph for predicting sparsely annotated gene function journal June 2015
Learned protein embeddings for machine learning journal March 2018
A Comprehensive, High-Resolution Map of a Gene’s Fitness Landscape journal February 2014
The FoldX web server: an online force field journal July 2005
CADD: predicting the deleteriousness of variants throughout the human genome journal October 2018
UniProt: a worldwide hub of protein knowledge November 2018
The Pfam protein families database in 2019 journal October 2018
Language models enable zero-shot prediction of the effects of mutations on protein function preprint November 2021
Evaluating Protein Transfer Learning with TAPE preprint June 2019
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models journal January 2013
ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing journal January 2021
Learning the language of viral evolution and escape journal January 2021
Learning to Forget: Continual Prediction with LSTM journal October 2000
HH-suite3 for fast remote homology detection and deep protein annotation journal September 2019
Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein journal September 2013
Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization journal October 2017
Protein 3D Structure Computed from Evolutionary Sequence Variation journal December 2011
Epistasis and the Dynamics of Reversion in Molecular Evolution journal July 2016
Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin journal June 2016
Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information journal May 2014
Sequence co-evolution gives 3D contacts and structures of protein complexes journal September 2014
The genetic landscape of a physical interaction journal April 2018
Mapping mutational effects along the evolutionary landscape of HIV envelope journal March 2018

Similar Records

Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning
Journal Article · 2024 · Proceedings of the National Academy of Sciences of the United States of America · OSTI ID:2323979

Specialization Restricts the Evolutionary Paths Available to Yeast Sugar Transporters
Journal Article · 2024 · Molecular Biology and Evolution · OSTI ID:2506705

Deep representation learning improves prediction of LacI-mediated transcriptional repression
Journal Article · 2021 · Proceedings of the National Academy of Sciences of the United States of America · OSTI ID:1804183

Related Subjects