Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Language Models for the Prediction of SARS-CoV-2 Inhibitors

Conference · · International Journal of High Performance Computing Applications
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ~9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1892426
Conference Information:
Journal Name: International Journal of High Performance Computing Applications Journal Volume: 2021
Country of Publication:
United States
Language:
English

Similar Records

Language models for the prediction of SARS-CoV-2 inhibitors
Journal Article · Thu Oct 06 20:00:00 EDT 2022 · International Journal of High Performance Computing Applications · OSTI ID:1891374

SARS-CoV2 billion-compound docking
Journal Article · Mon Mar 27 20:00:00 EDT 2023 · Scientific Data · OSTI ID:1963757

Related Subjects