Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
Abstract Motivation Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. Results We developed TransFun—a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. Availability and implementation The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- AR0001213; SC0020400; SC0021303
- OSTI ID:
- 1987726
- Journal Information:
- Bioinformatics, Journal Name: Bioinformatics Vol. 39 Journal Issue: Supplement_1; ISSN 1367-4803
- Publisher:
- Oxford University PressCopyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Similar Records
Geometry-complete perceptron networks for 3D molecular graphs
A gated graph transformer for protein complex structure quality assessment and its performance in CASP15