Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
Abstract
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.
- Authors:
-
- Kyulux North America Inc., Boston, MA (United States)
- Harvard Univ., Cambridge, MA (United States)
- Univ. of Toronto, ON (Canada)
- Univ. of Cambridge (United Kingdom)
- Google Brain, Mountain View, CA (United States); Princeton Univ., NJ (United States)
- Harvard Univ., Cambridge, MA (United States); Canadian Institute for Advanced Research (CIFAR), Toronto, ON (United States)
- Publication Date:
- Research Org.:
- Harvard Univ., Cambridge, MA (United States); Univ. of Toronto, ON (Canada)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22)
- OSTI Identifier:
- 1416858
- Alternate Identifier(s):
- OSTI ID: 1498675
- Grant/Contract Number:
- SC0015959
- Resource Type:
- Journal Article: Published Article
- Journal Name:
- ACS Central Science
- Additional Journal Information:
- Journal Volume: 4; Journal Issue: 2; Journal ID: ISSN 2374-7943
- Publisher:
- American Chemical Society (ACS)
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY
Citation Formats
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., and Aspuru-Guzik, Alán. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. United States: N. p., 2018.
Web. doi:10.1021/acscentsci.7b00572.
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., & Aspuru-Guzik, Alán. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. United States. doi:10.1021/acscentsci.7b00572.
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., and Aspuru-Guzik, Alán. Fri .
"Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules". United States. doi:10.1021/acscentsci.7b00572.
@article{osti_1416858,
title = {Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules},
author = {Gómez-Bombarelli, Rafael and Wei, Jennifer N. and Duvenaud, David and Hernández-Lobato, José Miguel and Sánchez-Lengeling, Benjamín and Sheberla, Dennis and Aguilera-Iparraguirre, Jorge and Hirzel, Timothy D. and Adams, Ryan P. and Aspuru-Guzik, Alán},
abstractNote = {We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.},
doi = {10.1021/acscentsci.7b00572},
journal = {ACS Central Science},
issn = {2374-7943},
number = 2,
volume = 4,
place = {United States},
year = {2018},
month = {1}
}
Web of Science
Figures / Tables:

Works referencing / citing this record:
Conditional deep surrogate models for stochastic, high-dimensional, and multi-fidelity systems
journal, May 2019
- Yang, Yibo; Perdikaris, Paris
- Computational Mechanics, Vol. 64, Issue 2
Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm
journal, June 2019
- Wu, Stephen; Kondo, Yukiko; Kakimoto, Masa-aki
- npj Computational Materials, Vol. 5, Issue 1
Accelerating the discovery of materials for clean energy in the era of smart automation
journal, April 2018
- Tabor, Daniel P.; Roch, Loïc M.; Saikin, Semion K.
- Nature Reviews Materials, Vol. 3, Issue 5
Controlling an organic synthesis robot with machine learning to search for new reactivity
journal, July 2018
- Granda, Jarosław M.; Donina, Liva; Dragone, Vincenza
- Nature, Vol. 559, Issue 7714
Extensive deep neural networks for transferring small scale learning to large scale systems
journal, January 2019
- Mills, Kyle; Ryczko, Kevin; Luchak, Iryna
- Chemical Science, Vol. 10, Issue 15
Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery
journal, July 2019
- Lee, Munhwan; Kim, Hyeyeon; Joe, Hyunwhan
- Journal of Cheminformatics, Vol. 11, Issue 1
Exploring differential evolution for inverse QSAR analysis
journal, January 2017
- Miyao, Tomoyuki; Funatsu, Kimito; Bajorath, Jürgen
- F1000Research, Vol. 6
Challenges and opportunities of polymer design with machine learning and high throughput experimentation
journal, May 2019
- Kumar, Jatin N.; Li, Qianxiao; Jun, Ye
- MRS Communications, Vol. 9, Issue 02
Figures / Tables found in this record: