DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Method for simultaneous characterization and expansion of reference libraries for small molecule identification

Abstract

A variational autoencoder (VAE) has been developed to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. The VAE has been extended to include a chemical property decoder, trained as a multitask network, to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, focused on properties that are obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involves a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training. Finally, the network is further refined by being trained with the experimental data. The trained network is used to predict chemical properties directly from structure and generate candidate structures with desired chemical properties. The network is extensible to other training data and molecular representations, and for use with other analytical platforms, for both chemical property and feature prediction as well as molecular structure generation.

Inventors:
;
Issue Date:
Research Org.:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1987154
Patent Number(s):
11587646
Application Number:
16/702,119
Assignee:
Battelle Memorial Institute (Richland, WA)
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Patent
Resource Relation:
Patent File Date: 12/03/2019
Country of Publication:
United States
Language:
English

Citation Formats

Colby, Sean M., and Renslow, Ryan S. Method for simultaneous characterization and expansion of reference libraries for small molecule identification. United States: N. p., 2023. Web.
Colby, Sean M., & Renslow, Ryan S. Method for simultaneous characterization and expansion of reference libraries for small molecule identification. United States.
Colby, Sean M., and Renslow, Ryan S. Tue . "Method for simultaneous characterization and expansion of reference libraries for small molecule identification". United States. https://www.osti.gov/servlets/purl/1987154.
@article{osti_1987154,
title = {Method for simultaneous characterization and expansion of reference libraries for small molecule identification},
author = {Colby, Sean M. and Renslow, Ryan S.},
abstractNote = {A variational autoencoder (VAE) has been developed to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. The VAE has been extended to include a chemical property decoder, trained as a multitask network, to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, focused on properties that are obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involves a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training. Finally, the network is further refined by being trained with the experimental data. The trained network is used to predict chemical properties directly from structure and generate candidate structures with desired chemical properties. The network is extensible to other training data and molecular representations, and for use with other analytical platforms, for both chemical property and feature prediction as well as molecular structure generation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Feb 21 00:00:00 EST 2023},
month = {Tue Feb 21 00:00:00 EST 2023}
}

Works referenced in this record:

Generating cross-domain data using variational mapping between embedding spaces
patent, January 2021


Structure-Based Modeling and Target-Selectivity Prediction
patent-application, December 2016