High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function
- Georgia Institute of Technology, Atlanta
- University of Idaho
- University of Missouri, Columbia
- ORNL
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE; USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1840182
- Country of Publication:
- United States
- Language:
- English
Similar Records
Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer
Towards Native Execution of Deep Learning on a Leadership-Class HPC System