Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Deep learning to generate in silico chemical property libraries and candidate molecules for small molecule identification in complex samples

Journal Article · · Analytical Chemistry

Comprehensive and unambiguous identification of small molecules in complex samples will revolutionize our understanding of the role of metabolites in biological systems. Existing and emerging technologies have enabled measurement of chemical properties of molecules in complex mixtures and, in concert, are sensitive enough to resolve even stereoisomers. Despite these experimental advances, small molecule identification is inhibited by (i) chemical reference libraries (e.g. mass spectra, collision cross section, and other measurable property libraries) representing <1% of known molecules, limiting the number of possible identifications, and (ii) the lack of a method to generate candidate matches directly from experimental features (i.e. without a library). To this end, we developed a variational autoencoder (VAE) to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. We extended the VAE to include a chemical property decoder, trained as a multitask network, in order to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, with its focus on properties that can be obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involved a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training, as experimental property data is limited. Finally, the network is further refined by being trained with the experimental data. This allows the network to learn as much as possible at each stage, enabling success with progressively smaller datasets without overfitting. Once trained, the network can be used to predict chemical properties directly from structure, as well as generate candidate structures with desired chemical properties. Our approach is orders of magnitude faster than first-principles simulation for CCS property prediction. Additionally, the ability to generate novel molecules along manifolds, defined by chemical property analogues, positions DarkChem as highly useful in a number of application areas, including metabolomics and small molecule identification, drug discovery and design, chemical forensics, and beyond.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1597650
Report Number(s):
PNNL-SA-144150
Journal Information:
Analytical Chemistry, Journal Name: Analytical Chemistry Journal Issue: 2 Vol. 92
Country of Publication:
United States
Language:
English

References (80)

Metabolomics – the link between genotypes and phenotypes journal January 2002
PubChem Substance and Compound databases journal September 2015
On Information and Sufficiency journal March 1951
Advances and challenges in deep generative models for de novo molecule generation journal October 2018
Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI) journal September 2007
Characterization of Traveling Wave Ion Mobility Separations in Structures for Lossless Ion Manipulations journal October 2015
New frontiers for mass spectrometry based upon structures for lossless ion manipulations journal January 2017
Object Recognition with Gradient-Based Learning book January 1999
Greatly Increasing Trapped Ion Populations for Mobility Separations Using Traveling Waves in Structures for Lossless Ion Manipulations journal September 2016
A Tutorial on the Cross-Entropy Method journal February 2005
Elucidation of Drug Metabolite Structural Isomers Using Molecular Modeling Coupled with Ion Mobility Mass Spectrometry journal January 2016
Ion Elevators and Escalators in Multilevel Structures for Lossless Ion Manipulations journal January 2017
Reinforced Adversarial Neural Computer for de Novo Molecular Design journal May 2018
In silico fragmentation for computer assisted identification of metabolite mass spectra journal March 2010
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico journal May 2017
Advanced Solvent Based Methods for Molecular Characterization of Soil Organic Matter by High-Resolution Mass Spectrometry journal April 2015
A randomized algorithm for the decomposition of matrices journal January 2011
ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries journal February 2019
Conditional Molecular Design with Deep Generative Models journal July 2018
Getting the right answers: understanding metabolomics challenges journal October 2014
Adding a new separation dimension to MS and LC-MS: What is the utility of ion mobility spectrometry? journal November 2017
Deep reinforcement learning for de novo drug design journal July 2018
Functional versus chemical diversity: is biodiversity important for drug discovery? journal May 2002
Inverse molecular design using machine learning: Generative models for matter engineering journal July 2018
Ion mobility conformational lipid atlas for high confidence lipidomics journal February 2019
EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings journal December 2018
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition book January 1990
Computational mass spectrometry for small-molecule fragmentation journal January 2014
Evaluation of Collision Cross Section Calibrants for Structural Analysis of Lipids by Traveling Wave Ion Mobility-Mass Spectrometry journal July 2016
De Novo Design at the Edge of Chaos: Miniperspective journal February 2016
Increasing Compound Identification Rates in Untargeted Lipidomics Research with Liquid Chromatography Drift Time–Ion Mobility Mass Spectrometry journal August 2018
A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem journal April 2009
Generative Recurrent Networks for De Novo Drug Design journal November 2017
Structural Elucidation of cis / trans Dicaffeoylquinic Acid Photoisomerization Using Ion Mobility Spectrometry-Mass Spectrometry journal March 2017
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions journal January 2011
Mass spectral databases for LC/MS- and GC/MS-based metabolomics: State of the field and future prospects journal April 2016
Three-Dimensional Ion Mobility/TOFMS Analysis of Electrosprayed Biomolecules journal June 1998
Molecular de-novo design through deep reinforcement learning journal September 2017
Deep-learning-based inverse design model for intelligent discovery of organic molecules journal December 2018
Prediction of collision cross section and retention time for broad scope screening in gradient reversed-phase liquid chromatography-ion mobility-high resolution accurate mass spectrometry journal March 2018
Speech understanding systems journal December 1977
Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology journal April 2013
Development of a new ion mobility time-of-flight mass spectrometer journal February 2015
Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification journal June 2014
HMDB 4.0: the human metabolome database for 2018 journal November 2017
ClassyFire: automated chemical classification with a comprehensive, computable taxonomy journal November 2016
Searching molecular structure databases using tandem MS data: are we there yet? journal February 2017
Contaminant screening of wastewater with HPLC-IM-qTOF-MS and LC+LC-IM-qTOF-MS using a CCS database journal August 2016
Collision cross section compendium to annotate and predict multi-omic compound identities journal January 2019
Salt Bridge Structures in the Absence of Solvent? The Case for the Oligoglycines journal May 1998
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules journal January 2018
From the computer to the laboratory: materials discovery and design using first-principles calculations journal May 2012
De Novo Design of Bioactive Small Molecules by Artificial Intelligence journal January 2018
CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra journal June 2014
Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence journal February 2014
Overview of Mass Spectrometry-Based Metabolomics: Opportunities and Challenges book January 2014
Intrinsic Size Parameters for Val, Ile, Leu, Gln, Thr, Phe, and Trp Residues from Ion Mobility Measurements of Polyamino Acid Ions journal October 1999
Large-Scale Prediction of Collision Cross-Section Values for Metabolites in Ion Mobility-Mass Spectrometry journal October 2016
Distributed structure-searchable toxicity (DSSTox) public database network: a proposal journal January 2002
ESI/Ion Trap/Ion Mobility/Time-of-Flight Mass Spectrometry for Rapid and Sensitive Analysis of Biomolecular Mixtures journal January 1999
UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling journal November 2015
Liquid-chromatography retention order prediction for metabolite identification journal September 2018
Molecular generative model based on conditional variational autoencoder for de novo molecular design journal July 2018
LipidCCS: Prediction of Collision Cross-Section Values for Lipids with High Precision To Support Ion Mobility–Mass Spectrometry-Based Lipidomics journal August 2017
The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology journal December 2016
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
Application of Generative Autoencoder in De Novo Molecular Design journal December 2017
De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping journal October 2018
Squeezing of Ion Populations and Peaks in Traveling Wave Ion Mobility Separations and Structures for Lossless Ion Manipulations Using Compression Ratio Ion Mobility Programming journal November 2016
Predicting Ion Mobility Collision Cross-Sections Using a Deep Neural Network: DeepCCS journal February 2019
Conformational Ordering of Biomolecules in the Gas Phase: Nitrogen Collision Cross Sections Measured on a Prototype High Resolution Drift Tube Ion Mobility-Mass Spectrometer journal February 2014
Sequential extraction protocol for organic matter from soils and sediments using high resolution mass spectrometry journal June 2017
Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA journal December 2017
Ion Mobility Derived Collision Cross Sections to Support Metabolomics Applications journal March 2014
21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometer Greatly Expands Mass Spectrometry Toolbox journal October 2016
LIII. On lines and planes of closest fit to systems of points in space journal November 1901
An automated framework for NMR chemical shift calculations of small organic molecules journal October 2018
Untargeted Molecular Discovery in Primary Metabolism: Collision Cross Section as a Molecular Descriptor in Ion Mobility-Mass Spectrometry journal November 2018
Exploring differential evolution for inverse QSAR analysis journal January 2017
Ultra-High Resolution Ion Mobility Separations Utilizing Traveling Waves in a 13 m Serpentine Path Length Structures for Lossless Ion Manipulations Module journal August 2016

Related Subjects