Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Array-Based Machine Learning for Functional Group Detection in Electron Ionization Mass Spectrometry

Journal Article · · ACS Omega
 [1];  [1];  [2];  [1]
  1. The Ohio State Univ., Columbus, OH (United States)
  2. California Institute of Technology (CalTech), Pasadena, CA (United States). Jet Propulsion Laboratory (JPL)
Mass spectrometry is a ubiquitous technique capable of complex chemical analysis. The fragmentation patterns that appear in mass spectrometry are an excellent target for artificial intelligence methods to automate and expedite the analysis of data to identify targets such as functional groups. To develop this approach, we trained models on electron ionization (a reproducible hard fragmentation technique) mass spectra so that not only the final model accuracies but also the reasoning behind model assignments could be evaluated. The convolutional neural network (CNN) models were trained on 2D images of the spectra using transfer learning of Inception V3, and the logistic regression models were trained using array-based data and Scikit Learn implementation in Python. Our training dataset consisted of 21,166 mass spectra from the United States’ National Institute of Standards and Technology (NIST) Webbook. The data was used to train models to identify functional groups, both specific (e.g., amines, esters) and generalized classifications (aromatics, oxygen-containing functional groups, and nitrogen-containing functional groups). We found that the highest final accuracies on identifying new data were observed using logistic regression rather than transfer learning on CNN models. It was also determined that the mass range most beneficial for functional group analysis is 0–100 m/z. We also found success in correctly identifying functional groups of example molecules selected from both the NIST database and experimental data. Beyond functional group analysis, we also have developed a methodology to identify impactful fragments for the accurate detection of the models’ targets. The results demonstrate a potential pathway for analyzing and screening substantial amounts of mass spectral data.
Research Organization:
The Ohio State Univ., Columbus, OH (United States)
Sponsoring Organization:
National Aeronautics and Space Administration (NASA); National Science Foundation (NSF); USDOE Office of Science (SC), Basic Energy Sciences (BES)
Grant/Contract Number:
SC0016381
Other Award/Contract Number:
20-PLANET20-0067
80NM0018D0004
CHE-1801971
OSTI ID:
2420500
Journal Information:
ACS Omega, Journal Name: ACS Omega Journal Issue: 27 Vol. 8; ISSN 2470-1343
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (40)

Mass spectrometry as a readout of protein structure and function journal January 1997
Automated precolumn derivatization system for analyzing physiological amino acids by liquid chromatography/mass spectrometry journal October 2009
Identification of carotenoids using mass spectrometry journal October 2013
Survey on SDN based network intrusion detection system using machine learning approaches journal January 2018
Advances in structure elucidation of small molecules using mass spectrometry journal August 2010
Rapid analysis of amino acids using pre-column derivatization journal December 1984
Precursor ion scanning–mass spectrometry for the determination of nitro functional groups in atmospheric particulate organic matter journal June 2008
DeepSpectra: An end-to-end deep learning approach for quantitative spectral analysis journal June 2019
Approximate model predictive building control via machine learning journal May 2018
Structure elucidation of phase II metabolites by tandem mass spectrometry: an overview journal March 2005
An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture journal July 2020
The use of lichen functional groups as indicators of air quality in a Mediterranean urban environment journal February 2012
Stability of feature selection algorithm: A review journal April 2022
Functional Group Identification for FTIR Spectra Using Image-Based Machine Learning Models journal June 2021
Unknown Metabolite Identification Using Machine Learning Collision Cross-Section Prediction and Tandem Mass Spectrometry journal January 2023
Identification of carbonyl compounds in environmental samples journal December 1993
Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra journal November 2020
Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry journal December 2018
MSNovelist: de novo structure generation from mass spectra journal May 2022
Extended motifs from water and chemical functional groups in organic molecular crystals journal January 2003
Spectral deep learning for prediction and prospective validation of functional groups journal January 2020
Image Retraining Using TensorFlow Implementation of the Pretrained Inception-v3 Model for Evaluating Gravel Road Dust journal June 2020
Searching molecular structure databases with tandem mass spectra using CSI:FingerID journal September 2015
Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches journal March 2006
Metabolite identification and molecular fingerprint prediction through machine learning journal July 2012
Development of a Machine Learning Algorithm for Drug Screening Analysis on High-Resolution UPLC-MSE/QTOF Mass Spectrometry journal January 2023
Functional Group Dependent Site Specific Fragmentation of Molecules by Low Energy Electrons journal September 2005
American Sign Language Recognition using Deep Learning and Computer Vision conference December 2018
ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
An improved deep learning architecture for person re-identification conference June 2015
Rethinking the Inception Architecture for Computer Vision conference June 2016
Molecular constituents of colorectal cancer metastatic to the liver by imaging infrared spectroscopy conference March 2015
Feature selection using principal feature analysis conference September 2007
Feature Selection journal December 2017
ImageNet Training in Minutes conference January 2018
A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data journal July 2008
A Review of Feature Selection and Its Methods journal March 2019
Combination of MALDI-TOF Mass Spectrometry and Machine Learning for Rapid Antimicrobial Resistance Screening: The Case of Campylobacter spp. journal February 2022
Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification journal February 2021
Feature Selection: A literature Review journal June 2014

Figures / Tables (9)


Similar Records

Neural Network Analysis of Nuclear Magnetic Resonance and Infrared Spectra
Thesis/Dissertation · Tue Jul 15 00:00:00 EDT 2025 · OSTI ID:2572129