DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SARS-CoV2 billion-compound docking

Journal Article · · Scientific Data

This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1963757
Alternate ID(s):
OSTI ID: 1991742
Journal Information:
Scientific Data, Vol. 10, Issue 1; ISSN 2052-4463
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (49)

Supercomputing Pipelines Search for Therapeutics Against COVID-19 journal January 2020
Rapid Identification of Potential Inhibitors of SARS‐CoV‐2 Main Protease by Deep Docking of 1.3 Billion Compounds journal March 2020
When Virtual Screening Yields Inactive Drugs: Dealing with False Theoretical Friends journal July 2022
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19 journal December 2020
AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility journal December 2009
Uncovering cryptic pockets in the SARS-CoV-2 spike glycoprotein journal August 2022
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading journal January 2009
Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits journal February 2022
Docking Finds GPCR Ligands in Dark Chemical Matter journal December 2019
A practical guide to large-scale docking journal September 2021
Structure of replicating SARS-CoV-2 polymerase journal May 2020
New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays journal April 2010
Homology modeling of DFG-in FMS-like tyrosine kinase 3 (FLT3) and structure-based virtual screening for inhibitor identification journal June 2015
Nsp3 of coronaviruses: Structures and functions of a large multi-domain protein journal January 2018
Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors journal February 2021
Scalable molecular dynamics on CPU and GPU architectures with NAMD journal July 2020
Activation of mitochondrial TUFM ameliorates metabolic dysregulation through coordinating autophagy induction journal January 2021
Three practical workflow schedulers for easy maximum parallelism journal October 2021
Structure- and Ligand-Based Virtual Screening on DUD-E + : Performance Dependence on Approximations to the Binding Pocket journal April 2020
Benchmarking the Ability of Common Docking Programs to Correctly Reproduce and Score Binding Modes in SARS-CoV-2 Protease Mpro journal May 2021
The Many Roles of Computation in Drug Discovery journal March 2004
Electrostatic effects in proteins: comparison of dielectric and charge models journal January 1991
Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets journal February 2015
Synthon-based ligand discovery in virtual libraries of over 11 billion compounds journal December 2021
Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19 journal August 2020
CHARMM36m: an improved force field for folded and intrinsically disordered proteins journal November 2016
A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design journal September 2015
Large-scale determination of previously unsolved protein structures using evolutionary information journal September 2015
Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors journal April 2020
High-throughput virtual laboratory for drug discovery using massive datasets journal March 2021
Generating Multibillion Chemical Space of Readily Accessible Screening Compounds journal November 2020
Open Babel: An open chemical toolbox journal October 2011
Hit Identification and Optimization in Virtual Screening: Practical Recommendations Based on a Critical Literature Analysis: Miniperspective journal June 2013
GPU-Accelerated Drug Discovery with Docking on the Summit Supercomputer: Porting, Optimization, and Application to COVID-19 Research
  • LeGrand, Scott; Scheinberg, Aaron; Tillack, Andreas F.
  • BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3388440.3412472
conference September 2020
Data Structures for Statistical Computing in Python conference January 2010
Structural plasticity of SARS-CoV-2 3CL Mpro active site cavity revealed by room temperature X-ray crystallography journal June 2020
An open-source drug discovery platform enables ultra-large virtual screens journal March 2020
Accelerating AutoDock4 with GPUs and Gradient-Based Local Search journal January 2021
Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor journal March 2020
Crystal structure of Nsp15 endoribonuclease NendoU from SARS‐CoV ‐2 journal May 2020
AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings journal July 2021
Structural basis for RNA replication by the hepatitis C virus polymerase journal February 2015
A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening journal February 2021
Identification of novel tyrosine kinase inhibitors for drug resistant T315I mutant BCR-ABL: a virtual screening and molecular dynamics simulations study journal November 2014
Protein Data Bank: the single global archive for 3D macromolecular structure data journal October 2018
Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit journal March 2008
Early endonuclease-mediated evasion of RNA sensing ensures efficient coronavirus replication journal February 2017
Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main protease journal October 2020
SARS-CoV2 Docking Dataset
  • Rogers, David; Glaser, Jens; Agarwal, Rupesh
  • Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States) https://doi.org/10.13139/OLCF/1783186
dataset January 2021