skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

Journal Article · · ACS Central Science
ORCiD logo [1]; ORCiD logo [2];  [3];  [4];  [2]; ORCiD logo [2];  [1];  [1];  [5]; ORCiD logo [6]
  1. Kyulux North America Inc., 10 Post Office Square, Suite 800, Boston, Massachusetts 02109, United States
  2. Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
  3. Department of Computer Science, University of Toronto, 6 King’s College Road, Toronto, Ontario M5S 3H5, Canada
  4. Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
  5. Google Brain, Mountain View, California, United States, Princeton University, Princeton, New Jersey, United States
  6. Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States, Biologically-Inspired Solar Energy Program, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.

Research Organization:
Harvard Univ., Cambridge, MA (United States); Univ. of Toronto, ON (Canada)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
SC0015959
OSTI ID:
1416858
Alternate ID(s):
OSTI ID: 1498675
Journal Information:
ACS Central Science, Journal Name: ACS Central Science Vol. 4 Journal Issue: 2; ISSN 2374-7943
Publisher:
American Chemical SocietyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 1726 works
Citation information provided by
Web of Science

References (36)

Molecular graph convolutions: moving beyond fingerprints journal August 2016
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning journal January 2012
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Chemical space as a source for new drugs journal January 2010
Exploring chemical space with discrete, gradient, and hybrid optimization methods journal November 2008
A few useful things to know about machine learning journal October 2012
The Chemical Space Project journal February 2015
Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models journal September 2017
Chemical Space Travel journal May 2007
The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid journal August 2011
On the Surprising Behavior of Distance Metrics in High Dimensional Space book January 2001
Extended-Connectivity Fingerprints journal April 2010
ChemTS: an efficient python library for de novo molecular generation journal November 2017
Designing Molecules by Optimizing Potentials journal March 2006
Strategy To Discover Diverse Optimal Molecules in the Small Molecule Universe journal February 2015
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks journal June 1989
Generating Sentences from a Continuous Space conference January 2016
Computational Design and Selection of Optimal Organic Photovoltaic Materials journal July 2011
Prediction of Physicochemical Parameters by Atomic Contributions journal August 1999
InChI - the worldwide chemical structure identifier standard journal January 2013
Virtual screening of chemical libraries journal December 2004
Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review journal January 2012
Estimation of the size of drug-like chemical space based on GDB-17 data journal August 2013
Efficient Computational Screening of Organic Polymer Photovoltaics journal April 2013
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
ZINC: A Free Tool to Discover Chemistry for Biology journal June 2012
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
PubChem Substance and Compound databases journal September 2015
What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery journal July 2015
Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach journal August 2016
Quantifying the chemical beauty of drugs journal January 2012
Recognizing Pitfalls in Virtual Screening: A Critical Review journal April 2012
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
Quantum chemistry structures and properties of 134 kilo molecules journal August 2014
Application of Generative Autoencoder in De Novo Molecular Design journal December 2017
Virtual screening: an endless staircase? journal April 2010

Figures / Tables (6)