Language models for the prediction of SARS-CoV-2 inhibitors

Blanchard, Andrew E.; Gounley, John; Bhowmik, Debsindhu; Chandra Shekar, Mayanka; Lyngaas, Isaac; Gao, Shang; Yin, Junqi; Tsaris, Aristeidis; Wang, Feiyi; Glaser, Jens

doi:10.1177/10943420221121804

Language models for the prediction of SARS-CoV-2 inhibitors

Journal Article · Fri Oct 07 00:00:00 EDT 2022 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/10943420221121804· OSTI ID:1891374

^[1]; Gounley, John ^[1]; Bhowmik, Debsindhu ^[1]; Chandra Shekar, Mayanka ^[1]; Lyngaas, Isaac ^[1]; Gao, Shang ^[1]; Yin, Junqi ^[1]; Tsaris, Aristeidis ^[1]; Wang, Feiyi ^[1]; ^[1]

Oak Ridge National Laboratory, Oak Ridge, TN, USA

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

View Journal Article

Sponsoring Organization:: USDOE

OSTI ID:: 1891374

Alternate ID(s):: OSTI ID: 1892426

Journal Information:: International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 5-6 Vol. 36; ISSN 1094-3420

Publisher:: SAGE PublicationsCopyright Statement

Country of Publication:: United States

Language:: English

References (49)

Binding MOAD (Mother Of All Databases) Hu, Liegi; Benson, Mark L.; Smith, Richard D. Proteins: Structure, Function, and Bioinformatics, Vol. 60, Issue 3 https://doi.org/10.1002/prot.20512	journal	June 2005
Rapid Re-Evolution of an X-Band Antenna for Nasa’s Space Technology 5 Mission Lohn, Jason D.; Hornby, Gregory S.; Linden, Derek S. Genetic Programming Theory and Practice III https://doi.org/10.1007/0-387-28111-8_5	book	January 2006
Introduction to Evolutionary Computing Eiben, A. E.; Smith, J. E. Natural Computing Series https://doi.org/10.1007/978-3-662-44874-8	book	January 2015
An interactive web-based dashboard to track COVID-19 in real time Dong, Ensheng; Du, Hongru; Gardner, Lauren The Lancet Infectious Diseases, Vol. 20, Issue 5 https://doi.org/10.1016/S1473-3099(20)30120-1	journal	May 2020
AIDS, Avian flu, SARS, MERS, Ebola, Zika… what next? Reperant, Leslie A.; Osterhaus, Albert D. M. E. Vaccine, Vol. 35, Issue 35 https://doi.org/10.1016/j.vaccine.2017.04.082	journal	August 2017
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 Acharya, A.; Agarwal, R.; Baker, M. B. Journal of Chemical Information and Modeling, Vol. 60, Issue 12 https://doi.org/10.1021/acs.jcim.0c01010	journal	December 2020
Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches Subramanian, Govindan; Ramsundar, Bharath; Pande, Vijay Journal of Chemical Information and Modeling, Vol. 56, Issue 10 https://doi.org/10.1021/acs.jcim.6b00290	journal	October 2016
GuacaMol: Benchmarking Models for de Novo Molecular Design Brown, Nathan; Fiscato, Marco; Segler, Marwin H. S. Journal of Chemical Information and Modeling, Vol. 59, Issue 3 https://doi.org/10.1021/acs.jcim.8b00839	journal	October 2018
Bidirectional Molecule Generation with Recurrent Neural Networks Grisoni, Francesca; Moret, Michael; Lingwood, Robin Journal of Chemical Information and Modeling, Vol. 60, Issue 3 https://doi.org/10.1021/acs.jcim.9b00943	journal	January 2020
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery Minnich, Amanda J.; McLoughlin, Kevin; Tse, Margaret Journal of Chemical Information and Modeling, Vol. 60, Issue 4 https://doi.org/10.1021/acs.jcim.9b01053	journal	April 2020
Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery Gentile, Francesco; Agrawal, Vibudh; Hsing, Michael ACS Central Science, Vol. 6, Issue 6 https://doi.org/10.1021/acscentsci.0c00229	journal	May 2020
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks Segler, Marwin H. S.; Kogej, Thierry; Tyrchan, Christian ACS Central Science, Vol. 4, Issue 1 https://doi.org/10.1021/acscentsci.7b00512	journal	December 2017
The Advent of Generative Chemistry Vanhaelen, Quentin; Lin, Yen-Chu; Zhavoronkov, Alex ACS Medicinal Chemistry Letters, Vol. 11, Issue 8 https://doi.org/10.1021/acsmedchemlett.0c00088	journal	July 2020
The Chemical Space Project Reymond, Jean-Louis Accounts of Chemical Research, Vol. 48, Issue 3 https://doi.org/10.1021/ar500432k	journal	February 2015
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules Weininger, David Journal of Chemical Information and Modeling, Vol. 28, Issue 1 https://doi.org/10.1021/ci00057a005	journal	February 1988
A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules Brown, Nathan; McKay, Ben; Gilardoni, François Journal of Chemical Information and Computer Sciences, Vol. 44, Issue 3 https://doi.org/10.1021/ci034290p	journal	May 2004
A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling Martins, Ines Filipa; Teixeira, Ana L.; Pinheiro, Luis Journal of Chemical Information and Modeling, Vol. 52, Issue 6 https://doi.org/10.1021/ci300124c	journal	June 2012
Prediction of Physicochemical Parameters by Atomic Contributions Wildman, Scott A.; Crippen, Gordon M. Journal of Chemical Information and Computer Sciences, Vol. 39, Issue 5 https://doi.org/10.1021/ci990307l	journal	August 1999
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds Virshup, Aaron M.; Contreras-García, Julia; Wipf, Peter Journal of the American Chemical Society, Vol. 135, Issue 19 https://doi.org/10.1021/ja401184g	journal	May 2013
The Proof and Measurement of Association Between Two Things. Spearman, C. Studies in individual differences: The search for intelligence. https://doi.org/10.1037/11491-005	book	January 1961
Comprehensive analysis of kinase inhibitor selectivity Davis, Mindy I.; Hunt, Jeremy P.; Herrgard, Sanna Nature Biotechnology, Vol. 29, Issue 11 https://doi.org/10.1038/nbt.1990	journal	October 2011
Quantifying the chemical beauty of drugs Bickerton, G. Richard; Paolini, Gaia V.; Besnard, Jérémy Nature Chemistry, Vol. 4, Issue 2 https://doi.org/10.1038/nchem.1243	journal	January 2012
Highly accurate protein structure prediction with AlphaFold Jumper, John; Evans, Richard; Pritzel, Alexander Nature https://doi.org/10.1038/s41586-021-03819-2	journal	July 2021
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models Schwaller, Philippe; Gaudin, Théophile; Lányi, Dávid Chemical Science, Vol. 9, Issue 28 https://doi.org/10.1039/c8sc02339e	journal	January 2018
A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space Jensen, Jan H. Chemical Science, Vol. 10, Issue 12 https://doi.org/10.1039/c8sc05372c	journal	January 2019
PDB-wide collection of binding data: current status of the PDBbind database Liu, Zhihai; Li, Yan; Han, Li Bioinformatics, Vol. 31, Issue 3 https://doi.org/10.1093/bioinformatics/btu626	journal	October 2014
DeepDTA: deep drug–target binding affinity prediction Öztürk, Hakime; Özgür, Arzucan; Ozkirimli, Elif Bioinformatics, Vol. 34, Issue 17 https://doi.org/10.1093/bioinformatics/bty593	journal	September 2018
BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities Liu, T.; Lin, Y.; Wen, X. Nucleic Acids Research, Vol. 35, Issue Database https://doi.org/10.1093/nar/gkl999	journal	January 2007
BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions Yang, Jianyi; Roy, Ambrish; Zhang, Yang Nucleic Acids Research, Vol. 41, Issue D1 https://doi.org/10.1093/nar/gks966	journal	October 2012
High Performance I/O For Large Scale Deep Learning Aizman, Alex; Maltby, Gavin; Breuel, Thomas 2019 IEEE International Conference on Big Data (Big Data) https://doi.org/10.1109/BigData47090.2019.9005703	conference	December 2019
Japanese and Korean voice search Schuster, Mike; Nakajima, Kaisuke 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) https://doi.org/10.1109/ICASSP.2012.6289079	conference	March 2012
Exascale Deep Learning for Climate Analytics Kurth, Thorsten; Treichler, Sean; Romero, Joshua SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00054	conference	November 2018
The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems Vazhkudai, Sudharshan S.; de Supinski, Bronis R.; Bland, Arthur S. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00055	conference	November 2018
ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing Elnaggar, Ahmed; Heinzinger, Michael; Dallago, Christian IEEE Transactions on Pattern Analysis and Machine Intelligence https://doi.org/10.1109/TPAMI.2021.3095381	journal	January 2021
Principles of early drug discovery: Principles of early drug discovery Hughes, Jp; Rees, S.; Kalindjian, Sb British Journal of Pharmacology, Vol. 162, Issue 6 https://doi.org/10.1111/j.1476-5381.2010.01127.x	journal	February 2011
Inverse molecular design using machine learning: Generative models for matter engineering Sanchez-Lengeling, Benjamin; Aspuru-Guzik, Alán Science, Vol. 361, Issue 6400 https://doi.org/10.1126/science.aat2663	journal	July 2018
Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks Morse, Gregory; Stanley, Kenneth O. GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016 https://doi.org/10.1145/2908812.2908916	conference	July 2016
Smiles-Bert Wang, Sheng; Guo, Yuzhi; Wang, Yuhong Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3307339.3342186	conference	September 2019
DeepSpeed Rasley, Jeff; Rajbhandari, Samyam; Ruwase, Olatunji Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining https://doi.org/10.1145/3394486.3406703	conference	August 2020
Efficient large-scale language model training on GPU clusters using megatron-LM Narayanan, Deepak; Shoeybi, Mohammad; Casper, Jared Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476209	conference	November 2021
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models Jacobs, Sam Ade; Moon, Tim; McLoughlin, Kevin The International Journal of High Performance Computing Applications, Vol. 35, Issue 5 https://doi.org/10.1177/10943420211010930	journal	May 2021
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions Ertl, Peter; Schuffenhauer, Ansgar Journal of Cheminformatics, Vol. 1, Issue 1 https://doi.org/10.1186/1758-2946-1-8	journal	June 2009
Randomized SMILES strings improve the quality of molecular generative models Arús-Pous, Josep; Johansson, Simon Viet; Prykhodko, Oleksii Journal of Cheminformatics, Vol. 11, Issue 1 https://doi.org/10.1186/s13321-019-0393-0	journal	November 2019
Using GANs with adaptive training data to search for new molecules Blanchard, Andrew E.; Stanley, Christopher; Bhowmik, Debsindhu Journal of Cheminformatics, Vol. 13, Issue 1 https://doi.org/10.1186/s13321-021-00494-3	journal	February 2021
Population-based De Novo Molecule Generation, Using Grammatical Evolution Yoshikawa, Naruki; Terayama, Kei; Sumita, Masato Chemistry Letters, Vol. 47, Issue 11 https://doi.org/10.1246/cl.180665	journal	November 2018
Transformers: State-of-the-Art Natural Language Processing Wolf, Thomas; Debut, Lysandre; Sanh, Victor Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations https://doi.org/10.18653/v1/2020.emnlp-demos.6	conference	January 2020
BERT-ATTACK: Adversarial Attack Against BERT Using BERT Li, Linyang; Ma, Ruotian; Guo, Qipeng Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) https://doi.org/10.18653/v1/2020.emnlp-main.500	conference	January 2020
Contextualized Perturbation for Textual Adversarial Attack Li, Dianqi; Zhang, Yizhe; Peng, Hao Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies https://doi.org/10.18653/v1/2021.naacl-main.400	conference	January 2021
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies https://doi.org/10.18653/v1/N19-1423	conference	January 2019

Similar Records

Language Models for the Prediction of SARS-CoV-2 Inhibitors

Conference · Sat Oct 01 00:00:00 EDT 2022 · International Journal of High Performance Computing Applications · OSTI ID:1892426

SARS-CoV2 billion-compound docking

Journal Article · Mon Mar 27 20:00:00 EDT 2023 · Scientific Data · OSTI ID:1963757

Language models for the prediction of SARS-CoV-2 inhibitors

Citation Formats

References (49)

Similar Records

Related Subjects