This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.
Rogers, David M., Agarwal, Rupesh, Vermaas, Josh V., et al., "SARS-CoV2 billion-compound docking," Scientific Data 10, no. 1 (2023), https://doi.org/10.1038/s41597-023-01984-9
@article{osti_1963757,
author = {Rogers, David M. and Agarwal, Rupesh and Vermaas, Josh V. and Smith, Micholas Dean and Rajeshwar, Rajitha T. and Cooper, Connor and Sedova, Ada and Boehm, Swen and Baker, Matthew and Glaser, Jens and others},
title = {SARS-CoV2 billion-compound docking},
annote = {Abstract This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.},
doi = {10.1038/s41597-023-01984-9},
url = {https://www.osti.gov/biblio/1963757},
journal = {Scientific Data},
issn = {ISSN 2052-4463},
number = {1},
volume = {10},
place = {United Kingdom},
publisher = {Nature Publishing Group},
year = {2023},
month = {03}}
LeGrand, Scott; Scheinberg, Aaron; Tillack, Andreas F.
BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informaticshttps://doi.org/10.1145/3388440.3412472
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF);
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)https://doi.org/10.13139/OLCF/1783186