Method for simultaneous characterization and expansion of reference libraries for small molecule identification
Abstract
A variational autoencoder (VAE) has been developed to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. The VAE has been extended to include a chemical property decoder, trained as a multitask network, to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, focused on properties that are obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involves a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training. Finally, the network is further refined by being trained with the experimental data. The trained network is used to predict chemical properties directly from structure and generate candidate structures with desired chemical properties. The network is extensible to other training data and molecular representations, and for use with other analytical platforms, for both chemical property and feature prediction as well as molecular structure generation.
- Inventors:
- Issue Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1987154
- Patent Number(s):
- 11587646
- Application Number:
- 16/702,119
- Assignee:
- Battelle Memorial Institute (Richland, WA)
- DOE Contract Number:
- AC05-76RL01830
- Resource Type:
- Patent
- Resource Relation:
- Patent File Date: 12/03/2019
- Country of Publication:
- United States
- Language:
- English
Citation Formats
Colby, Sean M., and Renslow, Ryan S. Method for simultaneous characterization and expansion of reference libraries for small molecule identification. United States: N. p., 2023.
Web.
Colby, Sean M., & Renslow, Ryan S. Method for simultaneous characterization and expansion of reference libraries for small molecule identification. United States.
Colby, Sean M., and Renslow, Ryan S. Tue .
"Method for simultaneous characterization and expansion of reference libraries for small molecule identification". United States. https://www.osti.gov/servlets/purl/1987154.
@article{osti_1987154,
title = {Method for simultaneous characterization and expansion of reference libraries for small molecule identification},
author = {Colby, Sean M. and Renslow, Ryan S.},
abstractNote = {A variational autoencoder (VAE) has been developed to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. The VAE has been extended to include a chemical property decoder, trained as a multitask network, to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, focused on properties that are obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involves a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training. Finally, the network is further refined by being trained with the experimental data. The trained network is used to predict chemical properties directly from structure and generate candidate structures with desired chemical properties. The network is extensible to other training data and molecular representations, and for use with other analytical platforms, for both chemical property and feature prediction as well as molecular structure generation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Feb 21 00:00:00 EST 2023},
month = {Tue Feb 21 00:00:00 EST 2023}
}
Works referenced in this record:
Generating cross-domain data using variational mapping between embedding spaces
patent, January 2021
- Chaudhury, Subhajit; Dasgupta, Sakyasingha; Munawar, Asim
- US Patent Document 10,885,111
Structure-Based Modeling and Target-Selectivity Prediction
patent-application, December 2016
- Ragno, Rino; Marshall, Garland R.; Ballante, Flavio
- US Patent Application 14/901924; 20160378912