Method for simultaneous characterization and expansion of reference libraries for small molecule identification
A variational autoencoder (VAE) has been developed to learn a continuous numerical, or latent, representation of molecular structure to expand reference libraries for small molecule identification. The VAE has been extended to include a chemical property decoder, trained as a multitask network, to shape the latent representation such that it assembles according to desired chemical properties. The approach is unique in its application to metabolomics and small molecule identification, focused on properties that are obtained from experimental measurements (m/z, CCS) paired with its training paradigm, which involves a cascade of transfer learning iterations. First, molecular representation is learned from a large dataset of structures with m/z labels. Next, in silico property values are used to continue training. Finally, the network is further refined by being trained with the experimental data. The trained network is used to predict chemical properties directly from structure and generate candidate structures with desired chemical properties. The network is extensible to other training data and molecular representations, and for use with other analytical platforms, for both chemical property and feature prediction as well as molecular structure generation.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- Assignee:
- Battelle Memorial Institute (Richland, WA)
- Patent Number(s):
- 11,587,646
- Application Number:
- 16/702,119
- OSTI ID:
- 1987154
- Resource Relation:
- Patent File Date: 12/03/2019
- Country of Publication:
- United States
- Language:
- English
Generating cross-domain data using variational mapping between embedding spaces
|
patent | January 2021 |
Structure-Based Modeling and Target-Selectivity Prediction
|
patent-application | December 2016 |
Similar Records
ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries
Unsupervised machine learning discovery of structural units and transformation pathways from imaging data