skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function

Journal Article · · Bioinformatics

Abstract Motivation Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. Results We developed TransFun—a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. Availability and implementation The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

Sponsoring Organization:
USDOE
Grant/Contract Number:
AR0001213; SC0020400; SC0021303
OSTI ID:
1987726
Journal Information:
Bioinformatics, Journal Name: Bioinformatics Vol. 39 Journal Issue: Supplement_1; ISSN 1367-4803
Publisher:
Oxford University PressCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (26)

Enhancing Protein Function Prediction Performance by Utilizing AlphaFold-Predicted Protein Structures journal August 2022
NetGO: improving large-scale protein function prediction with massive network information journal May 2019
COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information journal May 2017
Sensitive protein alignments at tree-of-life scale using DIAMOND journal April 2021
Structure-based protein function prediction using graph convolutional networks journal May 2021
Information-theoretic evaluation of predicted ontological annotations journal June 2013
DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction journal July 2021
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
TALE: Transformer-based protein function Annotation with joint sequence–Label Embedding journal March 2021
Accurate protein function prediction via graph attention networks with predicted structure information journal December 2021
ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing journal January 2021
GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank journal March 2018
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
The GOA database: Gene Ontology annotation updates for 2015 journal November 2014
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models journal November 2021
Highly accurate protein structure prediction with AlphaFold journal July 2021
ProFunc: a server for predicting protein function from 3D structure journal July 2005
The Protein Data Bank journal January 2000
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens journal November 2019
Fast and sensitive protein alignment using DIAMOND journal November 2014
A large-scale evaluation of computational protein function prediction journal January 2013
FFPred 3: feature-based function prediction for all Gene Ontology domains journal August 2016
Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics journal January 2008
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences journal April 2021
Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks journal February 2013
DeepGOPlus: improved protein function prediction from sequence journal July 2019

Similar Records

3D-equivariant graph neural networks for protein model quality assessment
Journal Article · Fri Jan 13 00:00:00 EST 2023 · Bioinformatics · OSTI ID:1987726

Geometry-complete perceptron networks for 3D molecular graphs
Journal Article · Mon Feb 19 00:00:00 EST 2024 · Bioinformatics · OSTI ID:1987726

A gated graph transformer for protein complex structure quality assessment and its performance in CASP15
Journal Article · Fri Jun 30 00:00:00 EDT 2023 · Bioinformatics · OSTI ID:1987726

Related Subjects