DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Metabolite discovery through global annotation of untargeted metabolomics data

Abstract

Liquid chromatography–high-resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantify all metabolites, but most LC-MS peaks remain unidentified. Here we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times and (when available) tandem mass spectrometry fragmentation patterns. Peaks are connected based on mass differences reflecting adduction, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically informative peak–peak relationships, including for peaks lacking tandem mass spectrometry spectra. Applying this approach to yeast and mouse data, we identified five previously unrecognized metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Furthermore, NetID applies existing metabolomic knowledge and global optimization to substantially improve annotation coverage and accuracy in untargeted metabolomics datasets, facilitating metabolite discovery.

Authors:
 [1]; ORCiD logo [2]; ORCiD logo [2];  [2];  [1];  [2];  [2]; ORCiD logo [2];  [2];  [2]; ORCiD logo [2]; ORCiD logo [2]; ORCiD logo [3];  [3];  [2]; ORCiD logo [4]
  1. Fudan University, Shanghai (China); Princeton University, NJ (United States)
  2. Princeton University, NJ (United States)
  3. University of Tennessee, Knoxville, TN (United States)
  4. Princeton University, NJ (United States); Ludwig Institute for Cancer Research, Princeton, NJ (United States)
Publication Date:
Research Org.:
CABBI, Urbana, IL (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1855984
Grant/Contract Number:  
SC0018420
Resource Type:
Accepted Manuscript
Journal Name:
Nature Methods
Additional Journal Information:
Journal Volume: 18; Journal Issue: 11; Journal ID: ISSN 1548-7091
Publisher:
Nature Publishing Group
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Computational models; Metabolomics

Citation Formats

Chen, Li, Lu, Wenyun, Wang, Lin, Xing, Xi, Chen, Ziyang, Teng, Xin, Zeng, Xianfeng, Muscarella, Antonio D., Shen, Yihui, Cowan, Alexis, McReynolds, Melanie R., Kennedy, Brandon J., Lato, Ashley M., Campagna, Shawn R., Singh, Mona, and Rabinowitz, Joshua D. Metabolite discovery through global annotation of untargeted metabolomics data. United States: N. p., 2021. Web. doi:10.1038/s41592-021-01303-3.
Chen, Li, Lu, Wenyun, Wang, Lin, Xing, Xi, Chen, Ziyang, Teng, Xin, Zeng, Xianfeng, Muscarella, Antonio D., Shen, Yihui, Cowan, Alexis, McReynolds, Melanie R., Kennedy, Brandon J., Lato, Ashley M., Campagna, Shawn R., Singh, Mona, & Rabinowitz, Joshua D. Metabolite discovery through global annotation of untargeted metabolomics data. United States. https://doi.org/10.1038/s41592-021-01303-3
Chen, Li, Lu, Wenyun, Wang, Lin, Xing, Xi, Chen, Ziyang, Teng, Xin, Zeng, Xianfeng, Muscarella, Antonio D., Shen, Yihui, Cowan, Alexis, McReynolds, Melanie R., Kennedy, Brandon J., Lato, Ashley M., Campagna, Shawn R., Singh, Mona, and Rabinowitz, Joshua D. Thu . "Metabolite discovery through global annotation of untargeted metabolomics data". United States. https://doi.org/10.1038/s41592-021-01303-3. https://www.osti.gov/servlets/purl/1855984.
@article{osti_1855984,
title = {Metabolite discovery through global annotation of untargeted metabolomics data},
author = {Chen, Li and Lu, Wenyun and Wang, Lin and Xing, Xi and Chen, Ziyang and Teng, Xin and Zeng, Xianfeng and Muscarella, Antonio D. and Shen, Yihui and Cowan, Alexis and McReynolds, Melanie R. and Kennedy, Brandon J. and Lato, Ashley M. and Campagna, Shawn R. and Singh, Mona and Rabinowitz, Joshua D.},
abstractNote = {Liquid chromatography–high-resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantify all metabolites, but most LC-MS peaks remain unidentified. Here we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times and (when available) tandem mass spectrometry fragmentation patterns. Peaks are connected based on mass differences reflecting adduction, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically informative peak–peak relationships, including for peaks lacking tandem mass spectrometry spectra. Applying this approach to yeast and mouse data, we identified five previously unrecognized metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Furthermore, NetID applies existing metabolomic knowledge and global optimization to substantially improve annotation coverage and accuracy in untargeted metabolomics datasets, facilitating metabolite discovery.},
doi = {10.1038/s41592-021-01303-3},
journal = {Nature Methods},
number = 11,
volume = 18,
place = {United States},
year = {Thu Oct 28 00:00:00 EDT 2021},
month = {Thu Oct 28 00:00:00 EDT 2021}
}

Works referenced in this record:

Untargeted high-resolution paired mass distance data mining for retrieving general chemical relationships
journal, November 2020


Cancer-associated IDH1 mutations produce 2-hydroxyglutarate
journal, November 2009

  • Dang, Lenny; White, David W.; Gross, Stefan
  • Nature, Vol. 462, Issue 7274
  • DOI: 10.1038/nature08617

Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700600n

The metabolomics standards initiative (MSI)
journal, August 2007


Discovery and Functional Characterization of a Yeast Sugar Alcohol Phosphatase
journal, September 2018


Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
journal, March 2007


SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information
journal, March 2019


MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools
journal, July 2019

  • Ernst, Madeleine; Kang, Kyo Bin; Caraballo-Rodríguez, Andrés Mauricio
  • Metabolites, Vol. 9, Issue 7
  • DOI: 10.3390/metabo9070144

In-Source CID Ramping and Covariant Ion Analysis of Hydrophilic Interaction Chromatography Metabolomics
journal, March 2020


Autonomous METLIN-Guided In-source Fragment Annotation for Untargeted Metabolomics
journal, January 2019

  • Domingo-Almenara, Xavier; Montenegro-Burke, J. Rafael; Guijas, Carlos
  • Analytical Chemistry, Vol. 91, Issue 5
  • DOI: 10.1021/acs.analchem.8b03126

Structure Annotation of All Mass Spectra in Untargeted Metabolomics
journal, January 2019


Glucose feeds the TCA cycle via circulating lactate
journal, October 2017

  • Hui, Sheng; Ghergurovich, Jonathan M.; Morscher, Raphael J.
  • Nature, Vol. 551, Issue 7678
  • DOI: 10.1038/nature24057

MetAssign: probabilistic annotation of metabolites from LC–MS data using a Bayesian clustering approach
journal, June 2014


Enhanced in-Source Fragmentation Annotation Enables Novel Data Independent Acquisition and Autonomous METLIN Molecular Identification
journal, April 2020


Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics
journal, May 2020


ChemSpider: An Online Chemical Information Resource
journal, November 2010

  • Pence, Harry E.; Williams, Antony
  • Journal of Chemical Education, Vol. 87, Issue 11
  • DOI: 10.1021/ed100697w

Propagating annotations of molecular networks using in silico fragmentation
journal, April 2018


Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra
journal, November 2020

  • Dührkop, Kai; Nothias, Louis-Félix; Fleischauer, Markus
  • Nature Biotechnology, Vol. 39, Issue 4
  • DOI: 10.1038/s41587-020-0740-8

Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics
journal, April 2019


FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry
journal, November 2016

  • Palmer, Andrew; Phapale, Prasad; Chernyavsky, Ilya
  • Nature Methods, Vol. 14, Issue 1
  • DOI: 10.1038/nmeth.4072

A cross-platform toolkit for mass spectrometry and proteomics
journal, October 2012

  • Chambers, Matthew C.; Maclean, Brendan; Burke, Robert
  • Nature Biotechnology, Vol. 30, Issue 10
  • DOI: 10.1038/nbt.2377

RAMClust: A Novel Feature Clustering Method Enables Spectral-Matching-Based Annotation for Metabolomics Data
journal, June 2014

  • Broeckling, C. D.; Afsar, F. A.; Neumann, S.
  • Analytical Chemistry, Vol. 86, Issue 14
  • DOI: 10.1021/ac501530d

Metabolomic Analysis and Visualization Engine for LC−MS Data
journal, December 2010

  • Melamud, Eugene; Vastag, Livia; Rabinowitz, Joshua D.
  • Analytical Chemistry, Vol. 82, Issue 23
  • DOI: 10.1021/ac1021166

A roadmap for natural product discovery based on large-scale genomics and metabolomics
journal, September 2014

  • Doroghazi, James R.; Albright, Jessica C.; Goering, Anthony W.
  • Nature Chemical Biology, Vol. 10, Issue 11
  • DOI: 10.1038/nchembio.1659

XCMS Online: A Web-Based Platform to Process Untargeted Metabolomic Data
journal, June 2012

  • Tautenhahn, Ralf; Patti, Gary J.; Rinehart, Duane
  • Analytical Chemistry, Vol. 84, Issue 11
  • DOI: 10.1021/ac300698c

Durable Remissions with Ivosidenib in IDH1 -Mutated Relapsed or Refractory AML
journal, June 2018

  • DiNardo, Courtney D.; Stein, Eytan M.; de Botton, Stéphane
  • New England Journal of Medicine, Vol. 378, Issue 25
  • DOI: 10.1056/NEJMoa1716984

Credentialing Features: A Platform to Benchmark and Optimize Untargeted Metabolomic Methods
journal, September 2014

  • Mahieu, Nathaniel Guy; Huang, Xiaojing; Chen, Ying-Jr
  • Analytical Chemistry, Vol. 86, Issue 19
  • DOI: 10.1021/ac503092d

Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online
journal, March 2018

  • Forsberg, Erica M.; Huan, Tao; Rinehart, Duane
  • Nature Protocols, Vol. 13, Issue 4
  • DOI: 10.1038/nprot.2017.151

KEGG as a reference resource for gene and protein annotation
journal, October 2015

  • Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1070

Annotation: A Computational Solution for Streamlining Metabolomics Analysis
journal, November 2017

  • Domingo-Almenara, Xavier; Montenegro-Burke, J. Rafael; Benton, H. Paul
  • Analytical Chemistry, Vol. 90, Issue 1
  • DOI: 10.1021/acs.analchem.7b03929

Database-independent molecular formula annotation using Gibbs sampling through ZODIAC
journal, October 2020

  • Ludwig, Marcus; Nothias, Louis-Félix; Dührkop, Kai
  • Nature Machine Intelligence, Vol. 2, Issue 10
  • DOI: 10.1038/s42256-020-00234-6

Chemical Discovery in the Era of Metabolomics
journal, April 2020

  • Sindelar, Miriam; Patti, Gary J.
  • Journal of the American Chemical Society, Vol. 142, Issue 20
  • DOI: 10.1021/jacs.9b13198

PubChemLite tier0 and tier1
dataset, January 2020


Improved Annotation of Untargeted Metabolomics Data through Buffer Modifications That Shift Adduct Mass and Intensity
journal, July 2020


Domain prediction with probabilistic directional context
journal, April 2017


MassBank: a public repository for sharing mass spectral data for life sciences
journal, July 2010

  • Horai, Hisayuki; Arita, Masanori; Kanaya, Shigehiko
  • Journal of Mass Spectrometry, Vol. 45, Issue 7
  • DOI: 10.1002/jms.1777

YMDB: the Yeast Metabolome Database
journal, November 2011

  • Jewison, T.; Knox, C.; Neveu, V.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr916

CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets
journal, December 2011

  • Kuhl, Carsten; Tautenhahn, Ralf; Böttcher, Christoph
  • Analytical Chemistry, Vol. 84, Issue 1
  • DOI: 10.1021/ac202450g

Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking
journal, August 2016

  • Wang, Mingxun; Carver, Jeremy J.; Phelan, Vanessa V.
  • Nature Biotechnology, Vol. 34, Issue 8
  • DOI: 10.1038/nbt.3597

Topic modeling for untargeted substructure exploration in metabolomics
journal, November 2016

  • van der Hooft, Justin Johan Jozias; Wandy, Joe; Barrett, Michael P.
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 48
  • DOI: 10.1073/pnas.1608041113

Solving and analyzing side-chain positioning problems using linear and integer programming
journal, November 2004


Biologically Consistent Annotation of Metabolomics Data
journal, December 2017


Mass spectrometry searches using MASST
journal, January 2020


Mass spectral molecular networking of living microbial colonies
journal, May 2012

  • Watrous, J.; Roach, P.; Alexandrov, T.
  • Proceedings of the National Academy of Sciences, Vol. 109, Issue 26
  • DOI: 10.1073/pnas.1203689109

Probabilistic assignment of formulas to mass peaks in metabolomics experiments
journal, December 2008


Adduct annotation in liquid chromatography/high-resolution mass spectrometry to enhance compound identification
journal, October 2020

  • Stricker, Thomas; Bonner, Ron; Lisacek, Frédérique
  • Analytical and Bioanalytical Chemistry, Vol. 413, Issue 2
  • DOI: 10.1007/s00216-020-03019-3

Feature-based molecular networking in the GNPS analysis environment
journal, August 2020


Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps
journal, June 2005


Recognizing Contamination Fragment Ions in Liquid Chromatography–Tandem Mass Spectrometry Data
journal, March 2021

  • Xing, Shipei; Yu, Huaxu; Liu, Min
  • Journal of the American Society for Mass Spectrometry, Vol. 32, Issue 9
  • DOI: 10.1021/jasms.0c00478

Liquid-chromatography retention order prediction for metabolite identification
journal, September 2018


Reproducible molecular networking of untargeted mass spectrometry data using GNPS
journal, May 2020


HMDB 4.0: the human metabolome database for 2018
journal, November 2017

  • Wishart, David S.; Feunang, Yannick Djoumbou; Marcu, Ana
  • Nucleic Acids Research, Vol. 46, Issue D1
  • DOI: 10.1093/nar/gkx1089

Peak Annotation and Verification Engine for Untargeted LC–MS Metabolomics
journal, December 2018


Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software
journal, August 2016


METLIN MS2 molecular standards database: a broad chemical and biological resource
journal, August 2020


Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment
journal, June 2021


PubChem 2019 update: improved access to chemical data
journal, October 2018

  • Kim, Sunghwan; Chen, Jie; Cheng, Tiejun
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1033

Metabolomics: beyond biomarkers and towards mechanisms
journal, March 2016

  • Johnson, Caroline H.; Ivanisevic, Julijana; Siuzdak, Gary
  • Nature Reviews Molecular Cell Biology, Vol. 17, Issue 7
  • DOI: 10.1038/nrm.2016.25

A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms
journal, March 2019


PubChemLite tier0 and tier1
dataset, January 2019