skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deep learning to generate in silico chemical property libraries and candidate molecules for small molecule identification in complex samples

Journal Article · · Analytical Chemistry

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g. mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e. without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller datasets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1597650
Report Number(s):
PNNL-SA-144150
Journal Information:
Analytical Chemistry, Vol. 92, Issue 2
Country of Publication:
United States
Language:
English

References (77)

Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence journal February 2014
Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI) journal September 2007
Getting the right answers: understanding metabolomics challenges journal October 2014
Functional versus chemical diversity: is biodiversity important for drug discovery? journal May 2002
Metabolomics – the link between genotypes and phenotypes journal January 2002
ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries journal February 2019
Ion Mobility Derived Collision Cross Sections to Support Metabolomics Applications journal March 2014
An automated framework for NMR chemical shift calculations of small organic molecules journal October 2018
Large-Scale Prediction of Collision Cross-Section Values for Metabolites in Ion Mobility-Mass Spectrometry journal October 2016
LipidCCS: Prediction of Collision Cross-Section Values for Lipids with High Precision To Support Ion Mobility–Mass Spectrometry-Based Lipidomics journal August 2017
CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra journal June 2014
Computational mass spectrometry for small-molecule fragmentation journal January 2014
Liquid-chromatography retention order prediction for metabolite identification journal September 2018
UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling journal November 2015
Inverse molecular design using machine learning: Generative models for matter engineering journal July 2018
Advances and challenges in deep generative models for de novo molecule generation journal October 2018
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
Application of Generative Autoencoder in De Novo Molecular Design journal December 2017
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules journal January 2018
Generative Recurrent Networks for De Novo Drug Design journal November 2017
The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology journal December 2016
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico journal May 2017
Conditional Molecular Design with Deep Generative Models journal July 2018
Deep-learning-based inverse design model for intelligent discovery of organic molecules journal December 2018
Molecular generative model based on conditional variational autoencoder for de novo molecular design journal July 2018
De Novo Design of Bioactive Small Molecules by Artificial Intelligence journal January 2018
Molecular de-novo design through deep reinforcement learning journal September 2017
Deep reinforcement learning for de novo drug design journal July 2018
Reinforced Adversarial Neural Computer for de Novo Molecular Design journal May 2018
De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping journal October 2018
Exploring differential evolution for inverse QSAR analysis journal January 2017
A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem journal April 2009
De Novo Design at the Edge of Chaos: Miniperspective journal February 2016
Structural Elucidation of cis / trans Dicaffeoylquinic Acid Photoisomerization Using Ion Mobility Spectrometry-Mass Spectrometry journal March 2017
Elucidation of Drug Metabolite Structural Isomers Using Molecular Modeling Coupled with Ion Mobility Mass Spectrometry journal January 2016
Untargeted Molecular Discovery in Primary Metabolism: Collision Cross Section as a Molecular Descriptor in Ion Mobility-Mass Spectrometry journal November 2018
Increasing Compound Identification Rates in Untargeted Lipidomics Research with Liquid Chromatography Drift Time–Ion Mobility Mass Spectrometry journal August 2018
Ion mobility conformational lipid atlas for high confidence lipidomics journal February 2019
Greatly Increasing Trapped Ion Populations for Mobility Separations Using Traveling Waves in Structures for Lossless Ion Manipulations journal September 2016
Ultra-High Resolution Ion Mobility Separations Utilizing Traveling Waves in a 13 m Serpentine Path Length Structures for Lossless Ion Manipulations Module journal August 2016
Squeezing of Ion Populations and Peaks in Traveling Wave Ion Mobility Separations and Structures for Lossless Ion Manipulations Using Compression Ratio Ion Mobility Programming journal November 2016
Characterization of Traveling Wave Ion Mobility Separations in Structures for Lossless Ion Manipulations journal October 2015
Ion Elevators and Escalators in Multilevel Structures for Lossless Ion Manipulations journal January 2017
New frontiers for mass spectrometry based upon structures for lossless ion manipulations journal January 2017
Mass spectral databases for LC/MS- and GC/MS-based metabolomics: State of the field and future prospects journal April 2016
Adding a new separation dimension to MS and LC-MS: What is the utility of ion mobility spectrometry? journal November 2017
Searching molecular structure databases using tandem MS data: are we there yet? journal February 2017
In silico fragmentation for computer assisted identification of metabolite mass spectra journal March 2010
Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification journal June 2014
From the computer to the laboratory: materials discovery and design using first-principles calculations journal May 2012
Prediction of collision cross section and retention time for broad scope screening in gradient reversed-phase liquid chromatography-ion mobility-high resolution accurate mass spectrometry journal March 2018
PubChem Substance and Compound databases journal September 2015
HMDB 4.0: the human metabolome database for 2018 journal November 2017
Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology journal April 2013
Distributed structure-searchable toxicity (DSSTox) public database network: a proposal journal January 2002
Contaminant screening of wastewater with HPLC-IM-qTOF-MS and LC+LC-IM-qTOF-MS using a CCS database journal August 2016
Intrinsic Size Parameters for Val, Ile, Leu, Gln, Thr, Phe, and Trp Residues from Ion Mobility Measurements of Polyamino Acid Ions journal October 1999
Three-Dimensional Ion Mobility/TOFMS Analysis of Electrosprayed Biomolecules journal June 1998
Conformational Ordering of Biomolecules in the Gas Phase: Nitrogen Collision Cross Sections Measured on a Prototype High Resolution Drift Tube Ion Mobility-Mass Spectrometer journal February 2014
Salt Bridge Structures in the Absence of Solvent? The Case for the Oligoglycines journal May 1998
Evaluation of Collision Cross Section Calibrants for Structural Analysis of Lipids by Traveling Wave Ion Mobility-Mass Spectrometry journal July 2016
ESI/Ion Trap/Ion Mobility/Time-of-Flight Mass Spectrometry for Rapid and Sensitive Analysis of Biomolecular Mixtures journal January 1999
On Information and Sufficiency journal March 1951
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition book January 1990
A Tutorial on the Cross-Entropy Method journal February 2005
Speech understanding systems journal December 1977
LIII. On lines and planes of closest fit to systems of points in space journal November 1901
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions journal January 2011
A randomized algorithm for the decomposition of matrices journal January 2011
Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA journal December 2017
EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings journal December 2018
Development of a new ion mobility time-of-flight mass spectrometer journal February 2015
Sequential extraction protocol for organic matter from soils and sediments using high resolution mass spectrometry journal June 2017
Advanced Solvent Based Methods for Molecular Characterization of Soil Organic Matter by High-Resolution Mass Spectrometry journal April 2015
21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometer Greatly Expands Mass Spectrometry Toolbox journal October 2016
Predicting Ion Mobility Collision Cross-Sections Using a Deep Neural Network: DeepCCS journal February 2019
ClassyFire: automated chemical classification with a comprehensive, computable taxonomy journal November 2016

Related Subjects