The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
Blanchard, Andrew E., et al. "Language models for the prediction of SARS-CoV-2 inhibitors." International Journal of High Performance Computing Applications, vol. 36, no. 5-6, Oct. 2022. https://doi.org/10.1177/10943420221121804
Blanchard, Andrew E., Gounley, John, Bhowmik, Debsindhu, Chandra Shekar, Mayanka, Lyngaas, Isaac, Gao, Shang, Yin, Junqi, Tsaris, Aristeidis, Wang, Feiyi, & Glaser, Jens (2022). Language models for the prediction of SARS-CoV-2 inhibitors. International Journal of High Performance Computing Applications, 36(5-6). https://doi.org/10.1177/10943420221121804
Blanchard, Andrew E., Gounley, John, Bhowmik, Debsindhu, et al., "Language models for the prediction of SARS-CoV-2 inhibitors," International Journal of High Performance Computing Applications 36, no. 5-6 (2022), https://doi.org/10.1177/10943420221121804
@article{osti_1891374,
author = {Blanchard, Andrew E. and Gounley, John and Bhowmik, Debsindhu and Chandra Shekar, Mayanka and Lyngaas, Isaac and Gao, Shang and Yin, Junqi and Tsaris, Aristeidis and Wang, Feiyi and Glaser, Jens},
title = {Language models for the prediction of SARS-CoV-2 inhibitors},
annote = {The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.},
doi = {10.1177/10943420221121804},
url = {https://www.osti.gov/biblio/1891374},
journal = {International Journal of High Performance Computing Applications},
issn = {ISSN 1094-3420},
number = {5-6},
volume = {36},
place = {United States},
publisher = {SAGE Publications},
year = {2022},
month = {10}}
International Journal of High Performance Computing Applications, Journal Name: International Journal of High Performance Computing Applications Journal Issue: 5-6 Vol. 36; ISSN 1094-3420
GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016https://doi.org/10.1145/2908812.2908916
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologieshttps://doi.org/10.18653/v1/2021.naacl-main.400
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologieshttps://doi.org/10.18653/v1/N19-1423