skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification

Journal Article · · Metabolites

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
2204719
Report Number(s):
PNNL-SA-189894
Journal Information:
Metabolites, Vol. 13, Issue 10; ISSN 2218-1989
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (26)

Challenges and opportunities of molecular epidemiology: using omics to address complex One Health issues in tropical settings journal July 2023
Wavelet- and Fourier-Transform-Based Spectrum Similarity Approaches to Compound Identification in Gas Chromatography/Mass Spectrometry journal June 2011
Variable Importance Plots—An Introduction to the vip Package journal January 2020
Discovery of false identification using similarity difference in GC-MS-based metabolomics journal August 2014
Computational mass spectrometry for small molecules journal March 2013
Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC–MS Metabolomics journal April 2023
Comparative analysis of mass spectral matching-based compound identification in gas chromatography–mass spectrometry journal July 2013
Combine multiple mass spectral similarity measures for compound identification journal January 2016
ranger : A Fast Implementation of Random Forests for High Dimensional Data in C++ and R journal January 2017
Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health journal March 2017
Special Issue ‘One Health meets Omics: The way forward to investigate zoonosis’ journal September 2022
The evolution of One Health: a decade of progress and challenges for the future journal January 2014
Editorial: New omics research challenges for Public and sustainable Health journal November 2022
A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data journal June 2014
Optimization and testing of mass spectral library search algorithms for compound identification journal September 1994
Only one health, and so many omics journal June 2015
Compound identification via deep classification model for electron-ionization mass spectrometry journal May 2021
Compound identification in GC-MS by simultaneously evaluating the mass spectrum and retention index journal January 2014
Scaling-up metabolomics: Current state and perspectives journal October 2023
The origins of the Gini index: extracts from Variabilità e Mutabilità (1912) by Corrado Gini journal June 2011
Deep Learning Driven GC-MS Library Search and Its Application for Metabolomics journal August 2020
Ensemble learning: A survey journal February 2018
Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics journal June 2015
Illuminating the dark matter in metabolomics journal October 2015
Compound Identification Using Partial and Semipartial Correlations for Gas Chromatography–Mass Spectrometry Data journal July 2012
Evaluating Retention Index Score Assumptions to Refine GC–MS Metabolite Identification journal May 2023

Similar Records

Evaluating Retention Index Score Assumptions to Refine GC–MS Metabolite Identification
Journal Article · Tue May 02 00:00:00 EDT 2023 · Analytical Chemistry · OSTI ID:2204719

EI_MS_ML
Software · Tue Apr 04 00:00:00 EDT 2023 · OSTI ID:2204719

BLINK enables ultrafast tandem mass spectrometry cosine similarity scoring
Journal Article · Fri Aug 18 00:00:00 EDT 2023 · Scientific Reports · OSTI ID:2204719