DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HDBind: encoding of molecular structure with hyperdimensional binary representations

Journal Article · · Scientific Reports
 [1];  [2];  [2];  [3];  [3];  [3];  [3];  [3];  [2];  [3]
  1. Univ. of California, San Diego, CA (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  2. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
  3. Univ. of California, San Diego, CA (United States)

Traditional methods for identifying “hit” molecules from a large collection of potential drug-like candidates rely on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug and its protein target. These approaches have a significant limitation in that they require exceptional computing capabilities for even relatively small collections of molecules. Increasingly large and complex state-of-the-art deep learning approaches have gained popularity with the promise to improve the productivity of drug design, notorious for its numerous failures. However, as deep learning models increase in their size and complexity, their acceleration at the hardware level becomes more challenging. Hyperdimensional Computing (HDC) has recently gained attention in the computer hardware community due to its algorithmic simplicity relative to deep learning approaches. The HDC learning paradigm, which represents data with high-dimension binary vectors, allows the use of low-precision binary vector arithmetic to create models of the data that can be learned without the need for the gradient-based optimization required in many conventional machine learning and deep learning methods. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated in a range of application areas (computer vision, bioinformatics, mass spectrometery, remote sensing, edge devices, etc.). To the best of our knowledge, our work is the first to consider HDC for the task of fast and efficient screening of modern drug-like compound libraries. We also propose the first HDC graph-based encoding methods for molecular data, demonstrating consistent and substantial improvement over previous work. We compare our approaches to alternative approaches on the well-studied MoleculeNet dataset and the recently proposed LIT-PCBA dataset derived from high quality PubChem assays. We demonstrate our methods on multiple target hardware platforms, including Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), showing at least an order of magnitude improvement in energy efficiency versus even our smallest neural network baseline model with a single hidden layer. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools. We make our code publicly available at https://github.com/LLNL/hdbind.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
American Heart Association; Defense Advanced Research Projects Agency (DARPA); USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC52-07NA27344
OSTI ID:
2504640
Report Number(s):
CRADA TC02274-4; LLNL--JRNL-847376; 1071301
Journal Information:
Scientific Reports, Journal Name: Scientific Reports Journal Issue: 1 Vol. 14; ISSN 2045-2322
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (71)

AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading journal January 2009
Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search journal March 2007
Recommendations for evaluation of computational methods journal March 2008
Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors journal January 2009
Generating Multibillion Chemical Space of Readily Accessible Screening Compounds journal November 2020
Structure- and Ligand-Based Virtual Screening on DUD-E + : Performance Dependence on Approximations to the Binding Pocket journal April 2020
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening journal April 2020
SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning journal March 2021
Improved Protein–Ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference journal March 2021
AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings journal July 2021
True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better journal June 2021
High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor journal November 2021
AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic Proteins in Virtual High-Throughput Screens journal March 2022
Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization journal April 2018
Comparative Assessment of Scoring Functions: The CASF-2016 Update journal November 2018
In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening journal October 2018
Analyzing Learned Molecular Representations for Property Prediction journal July 2019
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery journal April 2020
Accelerators for Classical Molecular Dynamics Simulations of Biomolecules journal June 2022
Uni-Dock: GPU-Accelerated Docking Enables Ultralarge Virtual Screening journal April 2023
InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein–Ligand Interaction Predictions journal December 2021
On the Frustration to Predict Binding Affinities from Protein–Ligand Structures with Deep Neural Networks journal May 2022
HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space journal May 2023
Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery journal May 2020
The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. journal May 1965
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
A Discussion of Measures of Enrichment in Virtual Screening:  Comparing the Information Content of Descriptors with Increasing Levels of Sophistication journal August 2005
Extended-Connectivity Fingerprints journal April 2010
MM/GBSA Binding Energy Prediction on the PDBbind Data Set: Successes, Failures, and Directions for Further Improvement
  • Greenidge, Paulette A.; Kramer, Christian; Mozziconacci, Jean-Christophe
  • Journal of Chemical Information and Modeling, Vol. 53, Issue 1 https://doi.org/10.1021/ci300425v
journal December 2012
Encoding Protein–Ligand Interaction Patterns in Fingerprints and Graphs journal March 2013
Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors journal February 2014
Benchmarking Sets for Molecular Docking journal November 2006
Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking journal July 2012
Learning representations by back-propagating errors journal October 1986
Rethinking drug design in the artificial intelligence era journal December 2019
Achieving software-equivalent accuracy for hyperdimensional computing with ferroelectric-based in-memory computing journal November 2022
AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection journal February 2023
Molecular contrastive learning of representations via graph neural networks journal March 2022
Large-scale chemical language representations capture molecular structure and properties journal December 2022
MoleculeNet: a benchmark for molecular machine learning journal January 2018
Development and evaluation of a deep learning model for protein–ligand binding affinity prediction journal May 2018
PubChem 2023 update journal October 2022
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods journal November 2023
Neuromorphic High Dimensional Computing Architecture for Classification Applications conference April 2021
Holographic reduced representations journal May 1995
XCelHD: An Efficient GPU-Powered Hyperdimensional Computing with Parallelized Training conference January 2022
MoleHD: Efficient Drug Discovery using Brain Inspired Hyperdimensional Computing conference December 2022
RelHD: A Graph-based Learning on FeFET with Hyperdimensional Computing conference October 2022
HyperMetric: Robust Hyperdimensional Computing on Error-prone Memories using Metric Learning conference November 2023
HD2FPGA: Automated Framework for Accelerating Hyperdimensional Computing on FPGAs conference April 2023
Efficient Biosignal Processing Using Hyperdimensional Computing: Network Templates for Combined Learning and Classification of ExG Signals journal January 2019
Classification Using Hyperdimensional Computing: A Review journal January 2020
Accelerating Hyperdimensional Computing on FPGAs by Exploiting Computational Reuse journal August 2020
OpenHD: A GPU-Powered Framework for Hyperdimensional Computing journal November 2022
High-Dimensional Computing as a Nanoscalable Paradigm journal September 2017
Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns journal September 2016
Accurate prediction of protein structures and interactions using a three-track neural network journal July 2021
Learning sensorimotor control with neuromorphic sensors: Toward hyperdimensional active perception journal May 2019
Green AI journal November 2020
Thrifty conference November 2020
High-throughput virtual screening of small molecule inhibitors for SARS-CoV-2 protein targets with deep fusion models
  • Stevenson, Garrett A.; Jones, Derek; Kim, Hyojin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3458817.3476193
conference November 2021
HDnn-PIM: Efficient in Memory Design of Hyperdimensional Computing with Feature Extraction conference June 2022
Random projection in dimensionality reduction: applications to image and text data conference January 2001
Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance journal October 2016
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening journal August 2019
Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective journal July 2019
GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction journal June 2022
A Theoretical Perspective on Hyperdimensional Computing journal October 2021
Laelaps: An Energy-Efficient Seizure Detection Algorithm from Long-term Human iEEG Recordings without False Alarms conference March 2019
SpecHD: Hyperdimensional Computing Framework for FPGA-Based Mass Spectrometry Clustering conference March 2024
Discovery of Small-Molecule Inhibitors of SARS-CoV-2 Proteins Using a Computational and Experimental Pipeline journal July 2021