Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Language models for the prediction of SARS-CoV-2 inhibitors

Journal Article · · International Journal of High Performance Computing Applications

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

Sponsoring Organization:
USDOE
OSTI ID:
1891374
Alternate ID(s):
OSTI ID: 1892426
Journal Information:
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 5-6 Vol. 36; ISSN 1094-3420
Publisher:
SAGE PublicationsCopyright Statement
Country of Publication:
United States
Language:
English

References (49)

Binding MOAD (Mother Of All Databases) journal June 2005
Rapid Re-Evolution of an X-Band Antenna for Nasa’s Space Technology 5 Mission book January 2006
Introduction to Evolutionary Computing book January 2015
An interactive web-based dashboard to track COVID-19 in real time journal May 2020
AIDS, Avian flu, SARS, MERS, Ebola, Zika… what next? journal August 2017
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 journal December 2020
Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches journal October 2016
GuacaMol: Benchmarking Models for de Novo Molecular Design journal October 2018
Bidirectional Molecule Generation with Recurrent Neural Networks journal January 2020
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery journal April 2020
Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery journal May 2020
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
The Advent of Generative Chemistry journal July 2020
The Chemical Space Project journal February 2015
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules journal May 2004
A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling journal June 2012
Prediction of Physicochemical Parameters by Atomic Contributions journal August 1999
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
The Proof and Measurement of Association Between Two Things. book January 1961
Comprehensive analysis of kinase inhibitor selectivity journal October 2011
Quantifying the chemical beauty of drugs journal January 2012
Highly accurate protein structure prediction with AlphaFold journal July 2021
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models journal January 2018
A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space journal January 2019
PDB-wide collection of binding data: current status of the PDBbind database journal October 2014
DeepDTA: deep drug–target binding affinity prediction journal September 2018
BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities journal January 2007
BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions journal October 2012
High Performance I/O For Large Scale Deep Learning conference December 2019
Japanese and Korean voice search conference March 2012
Exascale Deep Learning for Climate Analytics conference November 2018
The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems
  • Vazhkudai, Sudharshan S.; de Supinski, Bronis R.; Bland, Arthur S.
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00055
conference November 2018
ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing journal January 2021
Principles of early drug discovery: Principles of early drug discovery journal February 2011
Inverse molecular design using machine learning: Generative models for matter engineering journal July 2018
Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
  • Morse, Gregory; Stanley, Kenneth O.
  • GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016 https://doi.org/10.1145/2908812.2908916
conference July 2016
Smiles-Bert
  • Wang, Sheng; Guo, Yuzhi; Wang, Yuhong
  • Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3307339.3342186
conference September 2019
DeepSpeed conference August 2020
Efficient large-scale language model training on GPU clusters using megatron-LM
  • Narayanan, Deepak; Shoeybi, Mohammad; Casper, Jared
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476209
conference November 2021
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models journal May 2021
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Randomized SMILES strings improve the quality of molecular generative models journal November 2019
Using GANs with adaptive training data to search for new molecules journal February 2021
Population-based De Novo Molecule Generation, Using Grammatical Evolution journal November 2018
Transformers: State-of-the-Art Natural Language Processing conference January 2020
BERT-ATTACK: Adversarial Attack Against BERT Using BERT conference January 2020
Contextualized Perturbation for Textual Adversarial Attack conference January 2021
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton
  • Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies https://doi.org/10.18653/v1/N19-1423
conference January 2019

Similar Records

Language Models for the Prediction of SARS-CoV-2 Inhibitors
Conference · Sat Oct 01 00:00:00 EDT 2022 · International Journal of High Performance Computing Applications · OSTI ID:1892426

SARS-CoV2 billion-compound docking
Journal Article · Mon Mar 27 20:00:00 EDT 2023 · Scientific Data · OSTI ID:1963757

Related Subjects