Automating Genetic Algorithm Mutations for Molecules Using a Masked Language Model
Journal Article
·
· IEEE Transactions on Evolutionary Computation
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety of scientific and engineering disciplines. For a given problem, a suitable genome representation must be defined along with a mutation operator to generate subsequent generations. Unlike natural systems which display a variety of complex rearrangements (e.g. mobile genetic elements), mutation for genetic algorithms commonly utilizes only random point-wise changes. Furthermore, generalizing beyond point-wise mutations poses a key difficulty as useful genome rearrangements depend on the representation and problem domain. To move beyond the limitations of manually defined point-wise changes, here we propose the use of techniques from masked language models to automatically generate mutations. As a first step, common subsequences within a given population are used to generate a vocabulary. The vocabulary is then used to tokenize each genome. A masked language model is trained on the tokenized data in order to generate possible rearrangements (i.e. mutations). In order to illustrate the proposed strategy, we use string representations of molecules and use a genetic algorithm to optimize for drug-likeness and synthesizability. Finally, our results show that moving beyond random point-wise mutations accelerates genetic algorithm optimization.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- Grant/Contract Number:
- AC02-06CH11357; AC05-00OR22725; AC52-06NA25396; AC52-07NA27344
- OSTI ID:
- 1845799
- Journal Information:
- IEEE Transactions on Evolutionary Computation, Journal Name: IEEE Transactions on Evolutionary Computation Journal Issue: 4 Vol. 26; ISSN 1089-778X
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Adaptive language model training for molecular design
Journal Article
·
Wed Jun 07 20:00:00 EDT 2023
· Journal of Cheminformatics
·
OSTI ID:1984313