Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules
Abstract
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.
- Authors:
-
- Kyulux North America Inc., 10 Post Office Square, Suite 800, Boston, Massachusetts 02109, United States
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
- Department of Computer Science, University of Toronto, 6 King’s College Road, Toronto, Ontario M5S 3H5, Canada
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
- Google Brain, Mountain View, California, United States, Princeton University, Princeton, New Jersey, United States
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States, Biologically-Inspired Solar Energy Program, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
- Publication Date:
- Research Org.:
- Harvard Univ., Cambridge, MA (United States); Univ. of Toronto, ON (Canada)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1416858
- Alternate Identifier(s):
- OSTI ID: 1498675
- Grant/Contract Number:
- SC0015959
- Resource Type:
- Published Article
- Journal Name:
- ACS Central Science
- Additional Journal Information:
- Journal Name: ACS Central Science Journal Volume: 4 Journal Issue: 2; Journal ID: ISSN 2374-7943
- Publisher:
- American Chemical Society
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY
Citation Formats
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., and Aspuru-Guzik, Alán. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. United States: N. p., 2018.
Web. doi:10.1021/acscentsci.7b00572.
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., & Aspuru-Guzik, Alán. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. United States. https://doi.org/10.1021/acscentsci.7b00572
Gómez-Bombarelli, Rafael, Wei, Jennifer N., Duvenaud, David, Hernández-Lobato, José Miguel, Sánchez-Lengeling, Benjamín, Sheberla, Dennis, Aguilera-Iparraguirre, Jorge, Hirzel, Timothy D., Adams, Ryan P., and Aspuru-Guzik, Alán. Fri .
"Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules". United States. https://doi.org/10.1021/acscentsci.7b00572.
@article{osti_1416858,
title = {Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules},
author = {Gómez-Bombarelli, Rafael and Wei, Jennifer N. and Duvenaud, David and Hernández-Lobato, José Miguel and Sánchez-Lengeling, Benjamín and Sheberla, Dennis and Aguilera-Iparraguirre, Jorge and Hirzel, Timothy D. and Adams, Ryan P. and Aspuru-Guzik, Alán},
abstractNote = {We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.},
doi = {10.1021/acscentsci.7b00572},
journal = {ACS Central Science},
number = 2,
volume = 4,
place = {United States},
year = {Fri Jan 12 00:00:00 EST 2018},
month = {Fri Jan 12 00:00:00 EST 2018}
}
https://doi.org/10.1021/acscentsci.7b00572
Web of Science
Figures / Tables:
Works referenced in this record:
Molecular graph convolutions: moving beyond fingerprints
journal, August 2016
- Kearnes, Steven; McCloskey, Kevin; Berndl, Marc
- Journal of Computer-Aided Molecular Design, Vol. 30, Issue 8
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
journal, January 2012
- Rupp, Matthias; Tkatchenko, Alexandre; Müller, Klaus-Robert
- Physical Review Letters, Vol. 108, Issue 5
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions
journal, June 2009
- Ertl, Peter; Schuffenhauer, Ansgar
- Journal of Cheminformatics, Vol. 1, Issue 1
Chemical space as a source for new drugs
journal, January 2010
- Reymond, Jean-Louis; van Deursen, Ruud; Blum, Lorenz C.
- MedChemComm, Vol. 1, Issue 1
Exploring chemical space with discrete, gradient, and hybrid optimization methods
journal, November 2008
- Balamurugan, D.; Yang, Weitao; Beratan, David N.
- The Journal of Chemical Physics, Vol. 129, Issue 17
A few useful things to know about machine learning
journal, October 2012
- Domingos, Pedro
- Communications of the ACM, Vol. 55, Issue 10
The Chemical Space Project
journal, February 2015
- Reymond, Jean-Louis
- Accounts of Chemical Research, Vol. 48, Issue 3
Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models
journal, September 2017
- Liu, Bowen; Ramsundar, Bharath; Kawthekar, Prasad
- ACS Central Science, Vol. 3, Issue 10
Chemical Space Travel
journal, May 2007
- van Deursen, Ruud; Reymond, Jean-Louis
- ChemMedChem, Vol. 2, Issue 5
The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid
journal, August 2011
- Hachmann, Johannes; Olivares-Amaya, Roberto; Atahan-Evrenk, Sule
- The Journal of Physical Chemistry Letters, Vol. 2, Issue 17
On the Surprising Behavior of Distance Metrics in High Dimensional Space
book, January 2001
- Aggarwal, Charu C.; Hinneburg, Alexander; Keim, Daniel A.
- Database Theory — ICDT 2001
Extended-Connectivity Fingerprints
journal, April 2010
- Rogers, David; Hahn, Mathew
- Journal of Chemical Information and Modeling, Vol. 50, Issue 5
ChemTS: an efficient python library for de novo molecular generation
journal, November 2017
- Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki
- Science and Technology of Advanced Materials, Vol. 18, Issue 1
Designing Molecules by Optimizing Potentials
journal, March 2006
- Wang, Mingliang; Hu, Xiangqian; Beratan, David N.
- Journal of the American Chemical Society, Vol. 128, Issue 10
Strategy To Discover Diverse Optimal Molecules in the Small Molecule Universe
journal, February 2015
- Rupakheti, Chetan; Virshup, Aaron; Yang, Weitao
- Journal of Chemical Information and Modeling, Vol. 55, Issue 3
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
journal, June 1989
- Williams, Ronald J.; Zipser, David
- Neural Computation, Vol. 1, Issue 2
Generating Sentences from a Continuous Space
conference, January 2016
- Bowman, Samuel R.; Vilnis, Luke; Vinyals, Oriol
- Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning
Computational Design and Selection of Optimal Organic Photovoltaic Materials
journal, July 2011
- O’Boyle, Noel M.; Campbell, Casey M.; Hutchison, Geoffrey R.
- The Journal of Physical Chemistry C, Vol. 115, Issue 32
Prediction of Physicochemical Parameters by Atomic Contributions
journal, August 1999
- Wildman, Scott A.; Crippen, Gordon M.
- Journal of Chemical Information and Computer Sciences, Vol. 39, Issue 5
InChI - the worldwide chemical structure identifier standard
journal, January 2013
- Heller, Stephen; McNaught, Alan; Stein, Stephen
- Journal of Cheminformatics, Vol. 5, Issue 1
Virtual screening of chemical libraries
journal, December 2004
- Shoichet, Brian K.
- Nature, Vol. 432, Issue 7019
Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review
journal, January 2012
- Cheng, Tiejun; Li, Qingliang; Zhou, Zhigang
- The AAPS Journal, Vol. 14, Issue 1
Estimation of the size of drug-like chemical space based on GDB-17 data
journal, August 2013
- Polishchuk, P. G.; Madzhidov, T. I.; Varnek, A.
- Journal of Computer-Aided Molecular Design, Vol. 27, Issue 8
Efficient Computational Screening of Organic Polymer Photovoltaics
journal, April 2013
- Kanal, Ilana Y.; Owens, Steven G.; Bechtel, Jonathon S.
- The Journal of Physical Chemistry Letters, Vol. 4, Issue 10
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks
journal, December 2017
- Segler, Marwin H. S.; Kogej, Thierry; Tyrchan, Christian
- ACS Central Science, Vol. 4, Issue 1
ZINC: A Free Tool to Discover Chemistry for Biology
journal, June 2012
- Irwin, John J.; Sterling, Teague; Mysinger, Michael M.
- Journal of Chemical Information and Modeling, Vol. 52, Issue 7
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
journal, February 1988
- Weininger, David
- Journal of Chemical Information and Modeling, Vol. 28, Issue 1
PubChem Substance and Compound databases
journal, September 2015
- Kim, Sunghwan; Thiessen, Paul A.; Bolton, Evan E.
- Nucleic Acids Research, Vol. 44, Issue D1
What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery
journal, July 2015
- Pyzer-Knapp, Edward O.; Suh, Changwon; Gómez-Bombarelli, Rafael
- Annual Review of Materials Research, Vol. 45, Issue 1
Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach
journal, August 2016
- Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.
- Nature Materials, Vol. 15, Issue 10
Quantifying the chemical beauty of drugs
journal, January 2012
- Bickerton, G. Richard; Paolini, Gaia V.; Besnard, Jérémy
- Nature Chemistry, Vol. 4, Issue 2
Recognizing Pitfalls in Virtual Screening: A Critical Review
journal, April 2012
- Scior, Thomas; Bender, Andreas; Tresadern, Gary
- Journal of Chemical Information and Modeling, Vol. 52, Issue 4
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds
journal, May 2013
- Virshup, Aaron M.; Contreras-García, Julia; Wipf, Peter
- Journal of the American Chemical Society, Vol. 135, Issue 19
Quantum chemistry structures and properties of 134 kilo molecules
journal, August 2014
- Ramakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias
- Scientific Data, Vol. 1, Issue 1
Application of Generative Autoencoder in De Novo Molecular Design
journal, December 2017
- Blaschke, Thomas; Olivecrona, Marcus; Engkvist, Ola
- Molecular Informatics, Vol. 37, Issue 1-2
Virtual screening: an endless staircase?
journal, April 2010
- Schneider, Gisbert
- Nature Reviews Drug Discovery, Vol. 9, Issue 4
Figures / Tables found in this record: