Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Automating Genetic Algorithm Mutations for Molecules Using a Masked Language Model

Journal Article · · IEEE Transactions on Evolutionary Computation
Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety of scientific and engineering disciplines. For a given problem, a suitable genome representation must be defined along with a mutation operator to generate subsequent generations. Unlike natural systems which display a variety of complex rearrangements (e.g. mobile genetic elements), mutation for genetic algorithms commonly utilizes only random point-wise changes. Furthermore, generalizing beyond point-wise mutations poses a key difficulty as useful genome rearrangements depend on the representation and problem domain. To move beyond the limitations of manually defined point-wise changes, here we propose the use of techniques from masked language models to automatically generate mutations. As a first step, common subsequences within a given population are used to generate a vocabulary. The vocabulary is then used to tokenize each genome. A masked language model is trained on the tokenized data in order to generate possible rearrangements (i.e. mutations). In order to illustrate the proposed strategy, we use string representations of molecules and use a genetic algorithm to optimize for drug-likeness and synthesizability. Finally, our results show that moving beyond random point-wise mutations accelerates genetic algorithm optimization.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357; AC05-00OR22725; AC52-06NA25396; AC52-07NA27344
OSTI ID:
1845799
Journal Information:
IEEE Transactions on Evolutionary Computation, Journal Name: IEEE Transactions on Evolutionary Computation Journal Issue: 4 Vol. 26; ISSN 1089-778X
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (31)

Introduction to Evolutionary Computing book January 2015
GuacaMol: Benchmarking Models for de Novo Molecular Design journal October 2018
Bidirectional Molecule Generation with Recurrent Neural Networks journal January 2020
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction journal August 2019
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules journal May 2004
The Molecule Evoluator. An Interactive Evolutionary Algorithm for the Design of Drug-Like Molecules journal January 2006
Mining a Chemical Database for Fragment Co-occurrence:  Discovery of “Chemical Clichés” journal January 2006
Extended-Connectivity Fingerprints journal April 2010
De Novo Drug Design Using Multiobjective Evolutionary Graphs journal January 2009
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
Quantifying the chemical beauty of drugs journal January 2012
Mapping the space of chemical reactions using attention-based neural networks journal January 2021
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models journal January 2018
A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space journal January 2019
High Performance I/O For Large Scale Deep Learning conference December 2019
Japanese and Korean voice search conference March 2012
ZeRO: Memory optimizations Toward Training Trillion Parameter Models conference November 2020
Quality and Diversity Optimization: A Unifying Modular Framework journal April 2018
Mobility of Plasmids journal August 2010
Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
  • Morse, Gregory; Stanley, Kenneth O.
  • GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016 https://doi.org/10.1145/2908812.2908916
conference July 2016
Smiles-Bert
  • Wang, Sheng; Guo, Yuzhi; Wang, Yuhong
  • Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3307339.3342186
conference September 2019
Computer-Automated Evolution of an X-Band Antenna for NASA's Space Technology 5 Mission journal March 2011
Abandoning Objectives: Evolution Through the Search for Novelty Alone journal June 2011
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Randomized SMILES strings improve the quality of molecular generative models journal November 2019
Using GANs with adaptive training data to search for new molecules journal February 2021
Population-based De Novo Molecule Generation, Using Grammatical Evolution journal November 2018
Transformers: State-of-the-Art Natural Language Processing conference January 2020
BERT-ATTACK: Adversarial Attack Against BERT Using BERT conference January 2020

Similar Records

Adaptive language model training for molecular design
Journal Article · Wed Jun 07 20:00:00 EDT 2023 · Journal of Cheminformatics · OSTI ID:1984313