DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements

Abstract

Multidimensional measurements using state-of-the-art separations and mass spectrometry provide advantages in untargeted metabolomics analyses for studying biological and environmental bio-chemical processes. However, the lack of rapid analytical methods and robust algorithms for these heterogeneous data has limited its application. Here, we develop and evaluate a sensitive and high-throughput analytical and computational workflow to enable accurate metabolite profiling. Our workflow combines liquid chromatography, ion mobility spectrometry and data-independent acquisition mass spectrometry with PeakDecoder, a machine learning-based algorithm that learns to distinguish true co-elution and co-mobility from raw data and calculates metabolite identification error rates. We apply PeakDecoder for metabolite profiling of various engineered strains of Aspergillus pseudoterreus, Aspergillus niger, Pseudomonas putida and Rhodosporidium toruloides. Results, validated manually and against selected reaction monitoring and gas-chromatography platforms, show that 2683 features could be confidently annotated and quantified across 116 microbial sample runs using a library built from 64 standards.

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1];  [3]; ORCiD logo [1];  [2];  [2]; ORCiD logo [2];  [4];  [1];  [1]; ORCiD logo [5];  [6];  [6];  [6];  [7];  [7]; ORCiD logo [2] more »; ORCiD logo [8];  [9]; ORCiD logo [10]; ORCiD logo [7]; ORCiD logo [1];  [6]; ORCiD logo [1]; ORCiD logo [1] « less
  1. Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  2. Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
  3. Sandia National Laboratory (SNL-CA), Livermore, CA (United States)
  4. Argonne National Laboratory (ANL), Argonne, IL (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  5. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
  6. Agilent Technologies, Santa Clara, CA (United States)
  7. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  8. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  9. Sandia National Laboratory (SNL-CA), Livermore, CA (United States); USDOE Agile BioFoundry, Emeryville, CA (United States)
  10. University of North Carolina, Chapel Hill, NC (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States); Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Office of Sustainable Transportation. Bioenergy Technologies Office (BETO); National Institutes of Health (NIH)
OSTI Identifier:
1994378
Alternate Identifier(s):
OSTI ID: 1973113; OSTI ID: 1996726
Report Number(s):
PNNL-SA-174727
Journal ID: ISSN 2041-1723; 178960
Grant/Contract Number:  
AC02-06CH11357; P41 GM103493; AC05-76RL01830; AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Nature Communications
Additional Journal Information:
Journal Volume: 14; Journal Issue: 1; Journal ID: ISSN 2041-1723
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; HILIC; data-independent acquisition; ion mobility spectrometry; mass spectrometry; metabolomics; synthetic biology; Data processing; Machine learning

Citation Formats

Bilbao, Aivett, Munoz, Nathalie, Kim, Joonhoon, Orton, Daniel J., Gao, Yuqian, Poorey, Kunal, Pomraning, Kyle R., Weitz, Karl, Burnet, Meagan, Nicora, Carrie D., Wilton, Rosemarie, Deng, Shuang, Dai, Ziyu, Oksen, Ethan, Gee, Aaron, Fasani, Rick A., Tsalenko, Anya, Tanjore, Deepti, Gardner, James, Smith, Richard D., Michener, Joshua K., Gladden, John M., Baker, Erin S., Petzold, Christopher J., Kim, Young-Mo, Apffel, Alex, Magnuson, Jon K., and Burnum-Johnson, Kristin E. PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements. United States: N. p., 2023. Web. doi:10.1038/s41467-023-37031-9.
Bilbao, Aivett, Munoz, Nathalie, Kim, Joonhoon, Orton, Daniel J., Gao, Yuqian, Poorey, Kunal, Pomraning, Kyle R., Weitz, Karl, Burnet, Meagan, Nicora, Carrie D., Wilton, Rosemarie, Deng, Shuang, Dai, Ziyu, Oksen, Ethan, Gee, Aaron, Fasani, Rick A., Tsalenko, Anya, Tanjore, Deepti, Gardner, James, Smith, Richard D., Michener, Joshua K., Gladden, John M., Baker, Erin S., Petzold, Christopher J., Kim, Young-Mo, Apffel, Alex, Magnuson, Jon K., & Burnum-Johnson, Kristin E. PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements. United States. https://doi.org/10.1038/s41467-023-37031-9
Bilbao, Aivett, Munoz, Nathalie, Kim, Joonhoon, Orton, Daniel J., Gao, Yuqian, Poorey, Kunal, Pomraning, Kyle R., Weitz, Karl, Burnet, Meagan, Nicora, Carrie D., Wilton, Rosemarie, Deng, Shuang, Dai, Ziyu, Oksen, Ethan, Gee, Aaron, Fasani, Rick A., Tsalenko, Anya, Tanjore, Deepti, Gardner, James, Smith, Richard D., Michener, Joshua K., Gladden, John M., Baker, Erin S., Petzold, Christopher J., Kim, Young-Mo, Apffel, Alex, Magnuson, Jon K., and Burnum-Johnson, Kristin E. Fri . "PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements". United States. https://doi.org/10.1038/s41467-023-37031-9. https://www.osti.gov/servlets/purl/1994378.
@article{osti_1994378,
title = {PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements},
author = {Bilbao, Aivett and Munoz, Nathalie and Kim, Joonhoon and Orton, Daniel J. and Gao, Yuqian and Poorey, Kunal and Pomraning, Kyle R. and Weitz, Karl and Burnet, Meagan and Nicora, Carrie D. and Wilton, Rosemarie and Deng, Shuang and Dai, Ziyu and Oksen, Ethan and Gee, Aaron and Fasani, Rick A. and Tsalenko, Anya and Tanjore, Deepti and Gardner, James and Smith, Richard D. and Michener, Joshua K. and Gladden, John M. and Baker, Erin S. and Petzold, Christopher J. and Kim, Young-Mo and Apffel, Alex and Magnuson, Jon K. and Burnum-Johnson, Kristin E.},
abstractNote = {Multidimensional measurements using state-of-the-art separations and mass spectrometry provide advantages in untargeted metabolomics analyses for studying biological and environmental bio-chemical processes. However, the lack of rapid analytical methods and robust algorithms for these heterogeneous data has limited its application. Here, we develop and evaluate a sensitive and high-throughput analytical and computational workflow to enable accurate metabolite profiling. Our workflow combines liquid chromatography, ion mobility spectrometry and data-independent acquisition mass spectrometry with PeakDecoder, a machine learning-based algorithm that learns to distinguish true co-elution and co-mobility from raw data and calculates metabolite identification error rates. We apply PeakDecoder for metabolite profiling of various engineered strains of Aspergillus pseudoterreus, Aspergillus niger, Pseudomonas putida and Rhodosporidium toruloides. Results, validated manually and against selected reaction monitoring and gas-chromatography platforms, show that 2683 features could be confidently annotated and quantified across 116 microbial sample runs using a library built from 64 standards.},
doi = {10.1038/s41467-023-37031-9},
journal = {Nature Communications},
number = 1,
volume = 14,
place = {United States},
year = {Fri Apr 28 00:00:00 EDT 2023},
month = {Fri Apr 28 00:00:00 EDT 2023}
}

Works referenced in this record:

Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis
journal, January 2012

  • Gillet, Ludovic C.; Navarro, Pedro; Tate, Stephen
  • Molecular & Cellular Proteomics, Vol. 11, Issue 6
  • DOI: 10.1074/mcp.O111.016717

Building a global alliance of biofoundries
journal, May 2019


Evaluation of chromosomal insertion loci in the Pseudomonas putida KT2440 genome for predictable biosystems design
journal, December 2020

  • Chaves, Julie E.; Wilton, Rosemarie; Gao, Yuqian
  • Metabolic Engineering Communications, Vol. 11
  • DOI: 10.1016/j.mec.2020.e00139

Significance estimation for large scale metabolomics annotations by spectral matching
journal, November 2017


Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC–MS/MS
journal, October 2015


MetDIA: Targeted Metabolite Extraction of Multiplexed MS/MS Spectra Generated by Data-Independent Acquisition
journal, August 2016


Establishing a synthetic pathway for high-level production of 3-hydroxypropionic acid in Saccharomyces cerevisiae via β-alanine
journal, January 2015


Multi-Omics Driven Metabolic Network Reconstruction and Analysis of Lignocellulosic Carbon Utilization in Rhodosporidium toruloides
journal, January 2021

  • Kim, Joonhoon; Coradetti, Samuel T.; Kim, Young-Mo
  • Frontiers in Bioengineering and Biotechnology, Vol. 8
  • DOI: 10.3389/fbioe.2020.612832

mProphet: automated data processing and statistical validation for large-scale SRM experiments
journal, March 2011

  • Reiter, Lukas; Rinner, Oliver; Picotti, Paola
  • Nature Methods, Vol. 8, Issue 5
  • DOI: 10.1038/nmeth.1584

Further engineering of R. toruloides for the production of terpenes from lignocellulosic biomass
journal, April 2021


Diel metabolomics analysis of a hot spring chlorophototrophic microbial mat leads to new hypotheses of community member metabolisms
journal, April 2015


Skyline: an open source document editor for creating and analyzing targeted proteomics experiments
journal, February 2010


Identification and microbial production of a terpene-based advanced biofuel
journal, September 2011

  • Peralta-Yahya, Pamela P.; Ouellet, Mario; Chan, Rossana
  • Nature Communications, Vol. 2, Issue 1
  • DOI: 10.1038/ncomms1494

Multi-omics analysis unravels a segregated metabolic flux network that tunes co-utilization of sugar and aromatic carbons in Pseudomonas putida
journal, April 2019

  • Kukurugya, Matthew A.; Mendonca, Caroll M.; Solhtalab, Mina
  • Journal of Biological Chemistry, Vol. 294, Issue 21
  • DOI: 10.1074/jbc.RA119.007885

Functional genomics of lipid metabolism in the oleaginous yeast Rhodosporidium toruloides
journal, March 2018

  • Coradetti, Samuel T.; Pinel, Dominic; Geiselman, Gina M.
  • eLife, Vol. 7
  • DOI: 10.7554/eLife.32110

Structure Annotation of All Mass Spectra in Untargeted Metabolomics
journal, January 2019


MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses
journal, May 2016


Feature Selection with the Boruta Package
journal, January 2010

  • Kursa, Miron B.; Rudnicki, Witold R.
  • Journal of Statistical Software, Vol. 36, Issue 11
  • DOI: 10.18637/jss.v036.i11

From dirt to industrial applications: Pseudomonas putida as a Synthetic Biology chassis for hosting harsh biochemical reactions
journal, October 2016


High-Throughput Large-Scale Targeted Proteomics Assays for Quantifying Pathway Proteins in Pseudomonas putida KT2440
journal, December 2020

  • Gao, Yuqian; Fillmore, Thomas L.; Munoz, Nathalie
  • Frontiers in Bioengineering and Biotechnology, Vol. 8
  • DOI: 10.3389/fbioe.2020.603488

Deep learning, reinforcement learning, and world models
journal, August 2022


Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification
journal, May 2018


A Highly Conserved Signal Controls Degradation of 3-Hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) Reductase in Eukaryotes
journal, October 1999

  • Gardner, Richard G.; Hampton, Randolph Y.
  • Journal of Biological Chemistry, Vol. 274, Issue 44
  • DOI: 10.1074/jbc.274.44.31671

Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics
journal, May 2018


FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry
journal, November 2016

  • Palmer, Andrew; Phapale, Prasad; Chernyavsky, Ilya
  • Nature Methods, Vol. 14, Issue 1
  • DOI: 10.1038/nmeth.4072

OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data
journal, March 2014

  • Röst, Hannes L.; Rosenberger, George; Navarro, Pedro
  • Nature Biotechnology, Vol. 32, Issue 3
  • DOI: 10.1038/nbt.2841

Engineering the Oleaginous Yeast Rhodosporidium toruloides for Improved Resistance Against Inhibitors in Biomass Hydrolysates
journal, November 2021

  • Lyu, Liting; Chu, Yadong; Zhang, Sufang
  • Frontiers in Bioengineering and Biotechnology, Vol. 9
  • DOI: 10.3389/fbioe.2021.768934

A community-driven reconstruction of the Aspergillus niger metabolic network
journal, September 2018

  • Brandl, Julian; Aguilar-Pontes, Maria Victoria; Schäpe, Paul
  • Fungal Biology and Biotechnology, Vol. 5, Issue 1
  • DOI: 10.1186/s40694-018-0060-7

COBRApy: COnstraints-Based Reconstruction and Analysis for Python
journal, January 2013

  • Ebrahim, Ali; Lerman, Joshua A.; Palsson, Bernhard O.
  • BMC Systems Biology, Vol. 7, Issue 1
  • DOI: 10.1186/1752-0509-7-74

An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates
journal, September 2011


XY-Meta: A High-Efficiency Search Engine for Large-Scale Metabolome Annotation with Accurate FDR Estimation
journal, March 2020


In silico-guided engineering of Pseudomonas putida towards growth under micro-oxic conditions
journal, October 2019

  • Kampers, Linde F. C.; van Heck, Ruben G. A.; Donati, Stefano
  • Microbial Cell Factories, Vol. 18, Issue 1
  • DOI: 10.1186/s12934-019-1227-5

Rhodosporidium toruloides: a new platform organism for conversion of lignocellulose into terpene biofuels and bioproducts
journal, October 2017


pmartR : Quality Control and Statistics for Mass Spectrometry-Based Biological Data
journal, January 2019

  • Stratton, Kelly G.; Webb-Robertson, Bobbie-Jo M.; McCue, Lee Ann
  • Journal of Proteome Research, Vol. 18, Issue 3
  • DOI: 10.1021/acs.jproteome.8b00760

Challenges, progress and promises of metabolite annotation for LC–MS-based metabolomics
journal, February 2019


Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification
journal, December 2021


Processing strategies and software solutions for data-independent acquisition in mass spectrometry
journal, February 2015


Carotenoid-based phenotypic screen of the yeast deletion collection reveals new genes with roles in isoprenoid production
journal, January 2013


Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses
journal, August 2017

  • Rosenberger, George; Bludau, Isabell; Schmitt, Uwe
  • Nature Methods, Vol. 14, Issue 9
  • DOI: 10.1038/nmeth.4398

Rapid screening methods for yeast sub‐metabolome analysis with a high‐resolution ion mobility quadrupole time‐of‐flight mass spectrometer
journal, May 2019

  • Mairinger, Teresa; Kurulugama, Ruwan; Causon, Tim J.
  • Rapid Communications in Mass Spectrometry, Vol. 33, Issue S2
  • DOI: 10.1002/rcm.8420

Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700739d

Mass spectrometry-based metabolomics in health and medical science: a systematic review
journal, January 2020

  • Zhang, Xi-wu; Li, Qiu-han; Xu, Zuo-di
  • RSC Advances, Vol. 10, Issue 6
  • DOI: 10.1039/C9RA08985C

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics
journal, March 2022


Ion mobility spectrometry and the omics: Distinguishing isomers, molecular classes and contaminant ions in complex samples
journal, July 2019

  • Burnum-Johnson, Kristin E.; Zheng, Xueyun; Dodds, James N.
  • TrAC Trends in Analytical Chemistry, Vol. 116
  • DOI: 10.1016/j.trac.2019.04.022

Bacillus subtilis biofilm matrix components target seed oil bodies to promote growth and anti-fungal resistance in melon
journal, June 2022

  • Berlanga-Clavero, M. V.; Molina-Santiago, C.; Caraballo-Rodríguez, A. M.
  • Nature Microbiology, Vol. 7, Issue 7
  • DOI: 10.1038/s41564-022-01134-8

PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints
journal, December 2010

  • Yap, Chun Wei
  • Journal of Computational Chemistry, Vol. 32, Issue 7
  • DOI: 10.1002/jcc.21707

Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways
journal, August 2015


A lipidome atlas in MS-DIAL 4
journal, June 2020

  • Tsugawa, Hiroshi; Ikeda, Kazutaka; Takahashi, Mikiko
  • Nature Biotechnology, Vol. 38, Issue 10
  • DOI: 10.1038/s41587-020-0531-2

Machine Learning Applications for Mass Spectrometry-Based Metabolomics
journal, June 2020


Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications
journal, April 2013

  • Cheng, Chia-Ying; Tsai, Chia-Feng; Chen, Yu-Ju
  • Journal of Proteome Research, Vol. 12, Issue 5
  • DOI: 10.1021/pr301039b

DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution
journal, July 2021


A Preprocessing Tool for Enhanced Ion Mobility–Mass Spectrometry-Based Omics Workflows
journal, August 2021


DaDIA: Hybridizing Data-Dependent and Data-Independent Acquisition Modes for Generating High-Quality Metabolomic Data
journal, January 2021


Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-Based Proteomics Data
journal, November 2010

  • Webb-Robertson, Bobbie-Jo M.; McCue, Lee Ann; Waters, Katrina M.
  • Journal of Proteome Research, Vol. 9, Issue 11
  • DOI: 10.1021/pr1005247

Geranylgeranyl Pyrophosphate Is a Potent Regulator of HRD-dependent 3-Hydroxy-3-methylglutaryl-CoA Reductase Degradation in Yeast
journal, December 2009

  • Garza, Renee M.; Tran, Peter N.; Hampton, Randolph Y.
  • Journal of Biological Chemistry, Vol. 284, Issue 51
  • DOI: 10.1074/jbc.M109.023994

BioCompoundML: A General Biofuel Property Screening Tool for Biological Molecules Using Random Forest Classifiers
journal, September 2016


MetaboliteDetector: Comprehensive Analysis Tool for Targeted and Nontargeted GC/MS Based Metabolome Analysis
journal, May 2009

  • Hiller, Karsten; Hangebrauk, Jasper; Jäger, Christian
  • Analytical Chemistry, Vol. 81, Issue 9
  • DOI: 10.1021/ac802689c

Integration of Proteomics and Metabolomics Into the Design, Build, Test, Learn Cycle to Improve 3-Hydroxypropionic Acid Production in Aspergillus pseudoterreus
journal, April 2021

  • Pomraning, Kyle R.; Dai, Ziyu; Munoz, Nathalie
  • Frontiers in Bioengineering and Biotechnology, Vol. 9
  • DOI: 10.3389/fbioe.2021.603832

Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion Mobility Spectrometry, and Mass Spectrometry Dimensions
journal, July 2018

  • MacLean, Brendan X.; Pratt, Brian S.; Egertson, Jarrett D.
  • Journal of the American Society for Mass Spectrometry, Vol. 29, Issue 11
  • DOI: 10.1007/s13361-018-2028-5

An Interlaboratory Evaluation of Drift Tube Ion Mobility–Mass Spectrometry Collision Cross Section Measurements
journal, August 2017


High‐quality genome‐scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities
journal, November 2019

  • Nogales, Juan; Mueller, Joshua; Gudmundsson, Steinn
  • Environmental Microbiology, Vol. 22, Issue 1
  • DOI: 10.1111/1462-2920.14843