Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements

Journal Article · · Nature Communications
 [1];  [1];  [1];  [2];  [1];  [3];  [1];  [2];  [2];  [2];  [4];  [1];  [1];  [5];  [6];  [6];  [6];  [7];  [7];  [2] more »;  [8];  [9];  [10];  [7];  [1];  [6];  [1];  [1] « less
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  3. Sandia National Lab. (SNL-CA), Livermore, CA (United States)
  4. Argonne National Lab. (ANL), Argonne, IL (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  5. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  6. Agilent Technologies, Santa Clara, CA (United States)
  7. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  8. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  9. Sandia National Lab. (SNL-CA), Livermore, CA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  10. University of North Carolina, Chapel Hill, NC (United States)
Multidimensional measurements using state-of-the-art separations and mass spectrometry provide advantages in untargeted metabolomics analyses for studying biological and environmental bio-chemical processes. However, the lack of rapid analytical methods and robust algorithms for these heterogeneous data has limited its application. Here, we develop and evaluate a sensitive and high-throughput analytical and computational workflow to enable accurate metabolite profiling. Our workflow combines liquid chromatography, ion mobility spectrometry and data-independent acquisition mass spectrometry with PeakDecoder, a machine learning-based algorithm that learns to distinguish true co-elution and co-mobility from raw data and calculates metabolite identification error rates. We apply PeakDecoder for metabolite profiling of various engineered strains of Aspergillus pseudoterreus, Aspergillus niger, Pseudomonas putida and Rhodosporidium toruloides. Results, validated manually and against selected reaction monitoring and gas-chromatography platforms, show that 2683 features could be confidently annotated and quantified across 116 microbial sample runs using a library built from 64 standards.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Office of Sustainable Transportation. Bioenergy Technologies Office (BETO); National Institutes of Health (NIH)
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1973113
Alternate ID(s):
OSTI ID: 1994378
OSTI ID: 1996726
Report Number(s):
PNNL-SA-174727
Journal Information:
Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 14; ISSN 2041-1723
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (67)

PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints journal December 2010
An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates journal September 2011
Processing strategies and software solutions for data-independent acquisition in mass spectrometry journal February 2015
Rapid screening methods for yeast sub‐metabolome analysis with a high‐resolution ion mobility quadrupole time‐of‐flight mass spectrometer journal May 2019
Overview of Mass Spectrometry-Based Metabolomics: Opportunities and Challenges book January 2014
Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry Dimensions journal July 2018
From dirt to industrial applications: Pseudomonas putida as a Synthetic Biology chassis for hosting harsh biochemical reactions journal October 2016
Challenges, progress and promises of metabolite annotation for LC–MS-based metabolomics journal February 2019
Evaluation of chromosomal insertion loci in the Pseudomonas putida KT2440 genome for predictable biosystems design journal December 2020
Deep learning, reinforcement learning, and world models journal August 2022
Ion mobility spectrometry and the omics: Distinguishing isomers, molecular classes and contaminant ions in complex samples journal July 2019
Carotenoid-based phenotypic screen of the yeast deletion collection reveals new genes with roles in isoprenoid production journal January 2013
Establishing a synthetic pathway for high-level production of 3-hydroxypropionic acid in Saccharomyces cerevisiae via β-alanine journal January 2015
MetaboliteDetector: Comprehensive Analysis Tool for Targeted and Nontargeted GC/MS Based Metabolome Analysis journal May 2009
DaDIA: Hybridizing Data-Dependent and Data-Independent Acquisition Modes for Generating High-Quality Metabolomic Data journal January 2021
MetDIA: Targeted Metabolite Extraction of Multiplexed MS/MS Spectra Generated by Data-Independent Acquisition journal August 2016
Customized Consensus Spectral Library Building for Untargeted Quantitative Metabolomics Analysis with Data Independent Acquisition Mass Spectrometry and MetaboDIA Workflow journal April 2017
An Interlaboratory Evaluation of Drift Tube Ion Mobility–Mass Spectrometry Collision Cross Section Measurements journal August 2017
Structure Annotation of All Mass Spectra in Untargeted Metabolomics journal January 2019
XY-Meta: A High-Efficiency Search Engine for Large-Scale Metabolome Annotation with Accurate FDR Estimation journal March 2020
Comparison of Full-Scan, Data-Dependent, and Data-Independent Acquisition Modes in Liquid Chromatography–Mass Spectrometry Based Untargeted Metabolomics journal May 2020
BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers journal September 2016
A Preprocessing Tool for Enhanced Ion Mobility–Mass Spectrometry-Based Omics Workflows journal August 2021
Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC–MS/MS journal October 2015
Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification journal May 2018
pmartR : Quality Control and Statistics for Mass Spectrometry-Based Biological Data journal January 2019
Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-Based Proteomics Data journal November 2010
Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications journal April 2013
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin journal January 2008
MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification journal November 2008
OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data journal March 2014
Identification and microbial production of a terpene-based advanced biofuel journal September 2011
mProphet: automated data processing and statistical validation for large-scale SRM experiments journal March 2011
FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry journal November 2016
Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses journal August 2017
Significance estimation for large scale metabolomics annotations by spectral matching journal November 2017
Building a global alliance of biofoundries journal May 2019
DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics journal March 2022
Bacillus subtilis biofilm matrix components target seed oil bodies to promote growth and anti-fungal resistance in melon journal June 2022
A lipidome atlas in MS-DIAL 4 journal June 2020
DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution journal July 2021
Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification journal December 2021
Mass spectrometry-based metabolomics in health and medical science: a systematic review journal January 2020
A Highly Conserved Signal Controls Degradation of 3-Hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) Reductase in Eukaryotes journal October 1999
Geranylgeranyl Pyrophosphate Is a Potent Regulator of HRD-dependent 3-Hydroxy-3-methylglutaryl-CoA Reductase Degradation in Yeast journal December 2009
Multi-omics analysis unravels a segregated metabolic flux network that tunes co-utilization of sugar and aromatic carbons in Pseudomonas putida journal April 2019
Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis journal January 2012
Skyline: an open source document editor for creating and analyzing targeted proteomics experiments journal February 2010
High‐quality genome‐scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities journal November 2019
MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses journal May 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
  • Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos
  • Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16 https://doi.org/10.1145/2939672.2939778
conference January 2016
COBRApy: COnstraints-Based Reconstruction and Analysis for Python journal January 2013
In silico-guided engineering of Pseudomonas putida towards growth under micro-oxic conditions journal October 2019
Rhodosporidium toruloides: a new platform organism for conversion of lignocellulose into terpene biofuels and bioproducts journal October 2017
Further engineering of R. toruloides for the production of terpenes from lignocellulosic biomass journal April 2021
A community-driven reconstruction of the Aspergillus niger metabolic network journal September 2018
Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways journal August 2015
Feature Selection with the Boruta Package journal January 2010
High-Throughput Large-Scale Targeted Proteomics Assays for Quantifying Pathway Proteins in Pseudomonas putida KT2440 journal December 2020
Multi-Omics Driven Metabolic Network Reconstruction and Analysis of Lignocellulosic Carbon Utilization in Rhodosporidium toruloides journal January 2021
Integration of Proteomics and Metabolomics Into the Design, Build, Test, Learn Cycle to Improve 3-Hydroxypropionic Acid Production in Aspergillus pseudoterreus journal April 2021
Engineering the Oleaginous Yeast Rhodosporidium toruloides for Improved Resistance Against Inhibitors in Biomass Hydrolysates journal November 2021
Diel metabolomics analysis of a hot spring chlorophototrophic microbial mat leads to new hypotheses of community member metabolisms journal April 2015
Machine Learning Applications for Mass Spectrometry-Based Metabolomics journal June 2020
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics journal May 2018
PeakDecoder enables machine learning-1 based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements dataset January 2023
Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides journal March 2018