DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms

Journal Article · · Patterns

This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC); USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
2324039
Journal Information:
Patterns, Vol. 5, Issue 14; ISSN 2666-3899
Publisher:
Cell PressCopyright Statement
Country of Publication:
United States
Language:
English

References (23)

De novo generation of hit-like molecules from gene expression signatures using artificial intelligence journal January 2020
Language models can learn complex molecular distributions journal June 2022
Automating Genetic Algorithm Mutations for Molecules Using a Masked Language Model journal August 2022
Drug Analogs from Fragment-Based Long Short-Term Memory Generative Neural Networks journal January 2019
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Data-driven algorithms for inverse design of polymers journal January 2021
Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models journal May 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models journal January 2021
Transmol: repurposing a language model for molecular generation journal January 2021
Generative models for molecular discovery: Recent advances and challenges journal March 2022
Molecular Generative Model Based on an Adversarially Regularized Autoencoder journal December 2019
MolGPT: Molecular Generation Using a Transformer-Decoder Model journal October 2021
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
Language models for the prediction of SARS-CoV-2 inhibitors journal October 2022
Transferring a Molecular Foundation Model for Polymer Property Predictions journal December 2023
Generative Models as an Emerging Paradigm in the Chemical Sciences journal April 2023
Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks journal May 2020
Quantifying the chemical beauty of drugs journal January 2012
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Machine learning for a sustainable energy future journal October 2022
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules journal January 2018
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery journal April 2020