skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

Abstract

Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations formore » 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less

Authors:
 [1];  [2];  [3];  [3];  [4];  [5];  [4];  [6];  [5];  [2]
  1. Northwestern Univ., Evanston, IL (United States); Argonne National Lab., Argonne, IL (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
  3. Univ. of California, Davis, CA (United States)
  4. Univ. of Florida, Gainesville, FL (United States)
  5. Northwestern Univ., Evanston, IL (United States)
  6. Univ. of California, Davis, CA (United States); King Abdulaziz Univ., Jeddah (Saudi Arabia)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1221820
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Cheminformatics
Additional Journal Information:
Journal Volume: 7; Journal Issue: 1; Journal ID: ISSN 1758-2946
Publisher:
Chemistry Central Ltd.
Country of Publication:
United States
Language:
English
Subject:
37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; 97 MATHEMATICS AND COMPUTING; enzyme promiscuity; untargeted metabolomics; liquid chromatography–mass spectrometry; metabolite identification

Citation Formats

Jeffryes, James G., Colastani, Ricardo L., Elbadawi-Sidhu, Mona, Kind, Tobias, Niehaus, Thomas D., Broadbelt, Linda J., Hanson, Andrew D., Fiehn, Oliver, Tyo, Keith E. J., and Henry, Christopher S. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. United States: N. p., 2015. Web. doi:10.1186/s13321-015-0087-1.
Jeffryes, James G., Colastani, Ricardo L., Elbadawi-Sidhu, Mona, Kind, Tobias, Niehaus, Thomas D., Broadbelt, Linda J., Hanson, Andrew D., Fiehn, Oliver, Tyo, Keith E. J., & Henry, Christopher S. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. United States. doi:10.1186/s13321-015-0087-1.
Jeffryes, James G., Colastani, Ricardo L., Elbadawi-Sidhu, Mona, Kind, Tobias, Niehaus, Thomas D., Broadbelt, Linda J., Hanson, Andrew D., Fiehn, Oliver, Tyo, Keith E. J., and Henry, Christopher S. Fri . "MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics". United States. doi:10.1186/s13321-015-0087-1. https://www.osti.gov/servlets/purl/1221820.
@article{osti_1221820,
title = {MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics},
author = {Jeffryes, James G. and Colastani, Ricardo L. and Elbadawi-Sidhu, Mona and Kind, Tobias and Niehaus, Thomas D. and Broadbelt, Linda J. and Hanson, Andrew D. and Fiehn, Oliver and Tyo, Keith E. J. and Henry, Christopher S.},
abstractNote = {Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.},
doi = {10.1186/s13321-015-0087-1},
journal = {Journal of Cheminformatics},
issn = {1758-2946},
number = 1,
volume = 7,
place = {United States},
year = {2015},
month = {8}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 32 works
Citation information provided by
Web of Science

Figures / Tables:

Fig. 1 Fig. 1: MINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

Save / Share:

Works referenced in this record:

In silico fragmentation for computer assisted identification of metabolite mass spectra
journal, March 2010

  • Wolf, Sebastian; Schmidt, Stephan; Müller-Hannemann, Matthias
  • BMC Bioinformatics, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2105-11-148

MassBank: a public repository for sharing mass spectral data for life sciences
journal, July 2010

  • Horai, Hisayuki; Arita, Masanori; Kanaya, Shigehiko
  • Journal of Mass Spectrometry, Vol. 45, Issue 7
  • DOI: 10.1002/jms.1777

YMDB: the Yeast Metabolome Database
journal, November 2011

  • Jewison, T.; Knox, C.; Neveu, V.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr916

Extending Biochemical Databases by Metabolomic Surveys
journal, May 2011

  • Fiehn, Oliver; Barupal, Dinesh K.; Kind, Tobias
  • Journal of Biological Chemistry, Vol. 286, Issue 27
  • DOI: 10.1074/jbc.R110.173617

SMILES. 2. Algorithm for generation of unique SMILES notation
journal, May 1989

  • Weininger, David; Weininger, Arthur; Weininger, Joseph L.
  • Journal of Chemical Information and Modeling, Vol. 29, Issue 2
  • DOI: 10.1021/ci00062a008

Applications of liquid chromatography coupled to mass spectrometry-based metabolomics in clinical chemistry and toxicology: A review
journal, January 2011


Catalytic promiscuity and the evolution of new enzymatic activities
journal, April 1999


Metabolite identification and molecular fingerprint prediction through machine learning
journal, July 2012


A directed-overflow and damage-control N-glycosidase in riboflavin biosynthesis
journal, February 2015

  • Frelin, Océane; Huang, Lili; Hasnain, Ghulam
  • Biochemical Journal, Vol. 466, Issue 1
  • DOI: 10.1042/BJ20141237

Systematic Applications of Metabolomics in Metabolic Engineering
journal, December 2012


Metabolite Identification through Machine Learning— Tackling CASMI Challenge Using FingerID
journal, June 2013

  • Shen, Huibin; Zamboni, Nicola; Heinonen, Markus
  • Metabolites, Vol. 3, Issue 2
  • DOI: 10.3390/metabo3020484

Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013

  • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1076

LipidBlast in silico tandem mass spectrometry database for lipid identification
journal, June 2013

  • Kind, Tobias; Liu, Kwang-Hyeon; Lee, Do Yup
  • Nature Methods, Vol. 10, Issue 8
  • DOI: 10.1038/nmeth.2551

The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters
journal, May 2011

  • Bar-Even, Arren; Noor, Elad; Savir, Yonatan
  • Biochemistry, Vol. 50, Issue 21
  • DOI: 10.1021/bi2002289

Rethinking Mass Spectrometry-Based Small Molecule Identification Strategies in Metabolomics
journal, January 2014


Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research
journal, June 2009


InChI - the worldwide chemical structure identifier standard
journal, January 2013

  • Heller, Stephen; McNaught, Alan; Stein, Stephen
  • Journal of Cheminformatics, Vol. 5, Issue 1
  • DOI: 10.1186/1758-2946-5-7

In Silico Enzymatic Synthesis of a 400 000 Compound Biochemical Database for Nontargeted Metabolomics
journal, August 2013

  • Menikarachchi, Lochana C.; Hill, Dennis W.; Hamdalla, Mai A.
  • Journal of Chemical Information and Modeling, Vol. 53, Issue 9
  • DOI: 10.1021/ci400368v

Theoretical Considerations and Computational Analysis of the Complexity in Polyketide Synthesis Pathways
journal, July 2005

  • González-Lergier, Joanna; Broadbelt, Linda J.; Hatzimanikatis, Vassily
  • Journal of the American Chemical Society, Vol. 127, Issue 27
  • DOI: 10.1021/ja051586y

MIDAS: A Database-Searching Algorithm for Metabolite Identification in Metabolomics
journal, September 2014

  • Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P.
  • Analytical Chemistry, Vol. 86, Issue 19
  • DOI: 10.1021/ac5014783

Systematic Structural Characterization of Metabolites in Arabidopsis via Candidate Substrate-Product Pair Networks
journal, March 2014


Prediction of metabolic reactions based on atomic and molecular properties of small-molecule compounds
journal, April 2011


Natural product-likeness score revisited: an open-source, open-data implementation
journal, January 2012

  • Jayaseelan, Kalai Vanii; Moreno, Pablo; Truszkowski, Andreas
  • BMC Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2105-13-106

Estimation of Kováts Retention Indices Using Group Contributions
journal, March 2007

  • Stein, Stephen E.; Babushok, Valeri I.; Brown, Robert L.
  • Journal of Chemical Information and Modeling, Vol. 47, Issue 3
  • DOI: 10.1021/ci600548y

MolFind: A Software Package Enabling HPLC/MS-Based Identification of Unknown Chemical Structures
journal, September 2012

  • Menikarachchi, Lochana C.; Cawley, Shannon; Hill, Dennis W.
  • Analytical Chemistry, Vol. 84, Issue 21
  • DOI: 10.1021/ac302048x

PathPred: an enzyme-catalyzed metabolic pathway prediction server
journal, April 2010

  • Moriya, Y.; Shigemizu, D.; Hattori, M.
  • Nucleic Acids Research, Vol. 38, Issue Web Server
  • DOI: 10.1093/nar/gkq318

Metabolite and reaction inference based on enzyme specificities
journal, August 2009


CASMI: And the Winner is . . .
journal, May 2013


Network Context and Selection in the Evolution to Enzyme Specificity
journal, August 2012


The Rise of Chemodiversity in Plants
journal, June 2012


The University of Minnesota Pathway Prediction System: multi-level prediction and visualization
journal, April 2011

  • Gao, J.; Ellis, L. B. M.; Wackett, L. P.
  • Nucleic Acids Research, Vol. 39, Issue suppl
  • DOI: 10.1093/nar/gkr200

Data-driven extraction of relative reasoning rules to limit combinatorial explosion in biodegradation pathway prediction
journal, July 2008


Metabolomics in nutritional epidemiology: identifying metabolites associated with diet and quantifying their potential to uncover diet-disease relations in populations
journal, April 2014

  • Guertin, Kristin A.; Moore, Steven C.; Sampson, Joshua N.
  • The American Journal of Clinical Nutrition, Vol. 100, Issue 1
  • DOI: 10.3945/ajcn.113.078758

Genome-Scale Thermodynamic Analysis of Escherichia coli Metabolism
journal, February 2006

  • Henry, Christopher S.; Jankowski, Matthew D.; Broadbelt, Linda J.
  • Biophysical Journal, Vol. 90, Issue 4, p. 1453-1461
  • DOI: 10.1529/biophysj.105.071720

Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit
journal, March 2008

  • O'Boyle, Noel M.; Morley, Chris; Hutchison, Geoffrey R.
  • Chemistry Central Journal, Vol. 2, Issue 1
  • DOI: 10.1186/1752-153X-2-5

Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical Identification
journal, July 2012


A systematic comparison of the MetaCyc and KEGG pathway databases
journal, January 2013


BKM-react, an integrated biochemical reaction database
journal, January 2011

  • Lang, Maren; Stelzer, Michael; Schomburg, Dietmar
  • BMC Biochemistry, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2091-12-42

Metabolomics: the apogee of the omics trilogy
journal, March 2012

  • Patti, Gary J.; Yanes, Oscar; Siuzdak, Gary
  • Nature Reviews Molecular Cell Biology, Vol. 13, Issue 4
  • DOI: 10.1038/nrm3314

MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification
journal, March 2013

  • Li, Liang; Li, Ronghong; Zhou, Jianjun
  • Analytical Chemistry, Vol. 85, Issue 6
  • DOI: 10.1021/ac400099b

MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases
journal, January 2012

  • Kumar, Akhil; Suthers, Patrick F.; Maranas, Costas D.
  • BMC Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2105-13-6

EcoCyc: fusing model organism databases with systems biology
journal, November 2012

  • Keseler, Ingrid M.; Mackie, Amanda; Peralta-Gil, Martin
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1027

    Works referencing / citing this record:

    MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics [Supplemental Data]
    dataset, August 2015

    • Jeffryes, James; Colastani, Ricardo; Elbadawi-Sidhu, Mona
    • figshare-Supplementary information for journal article at DOI: 10.1186/s13321-015-0087-1, 2 DOCX files
    • DOI: 10.6084/m9.figshare.c.3697504

    HMDB 4.0: the human metabolome database for 2018
    journal, November 2017

    • Wishart, David S.; Feunang, Yannick Djoumbou; Marcu, Ana
    • Nucleic Acids Research, Vol. 46, Issue D1
    • DOI: 10.1093/nar/gkx1089

    RetSynth: determining all optimal and sub-optimal synthetic pathways that facilitate synthesis of target compounds in chassis organisms
    journal, September 2019


    Towards creating an extended metabolic model (EMM) for E. coli using enzyme promiscuity prediction and metabolomics data
    journal, June 2019

    • Amin, Sara A.; Chavez, Elizabeth; Porokhin, Vladimir
    • Microbial Cell Factories, Vol. 18, Issue 1
    • DOI: 10.1186/s12934-019-1156-3

    Critical Assessment of Small Molecule Identification 2016: automated methods
    journal, March 2017

    • Schymanski, Emma L.; Ruttkies, Christoph; Krauss, Martin
    • Journal of Cheminformatics, Vol. 9, Issue 1
    • DOI: 10.1186/s13321-017-0207-1

    A review of parameters and heuristics for guiding metabolic pathfinding
    journal, September 2017


    HMDB 4.0: the human metabolome database for 2018
    journal, November 2017

    • Wishart, David S.; Feunang, Yannick Djoumbou; Marcu, Ana
    • Nucleic Acids Research, Vol. 46, Issue D1
    • DOI: 10.1093/nar/gkx1089

    RetSynth: determining all optimal and sub-optimal synthetic pathways that facilitate synthesis of target compounds in chassis organisms
    journal, September 2019


    Towards creating an extended metabolic model (EMM) for E. coli using enzyme promiscuity prediction and metabolomics data
    journal, June 2019

    • Amin, Sara A.; Chavez, Elizabeth; Porokhin, Vladimir
    • Microbial Cell Factories, Vol. 18, Issue 1
    • DOI: 10.1186/s12934-019-1156-3

    Critical Assessment of Small Molecule Identification 2016: automated methods
    journal, March 2017

    • Schymanski, Emma L.; Ruttkies, Christoph; Krauss, Martin
    • Journal of Cheminformatics, Vol. 9, Issue 1
    • DOI: 10.1186/s13321-017-0207-1

    A review of parameters and heuristics for guiding metabolic pathfinding
    journal, September 2017


    Molecular structures enumeration and virtual screening in the chemical space with RetroPath2.0
    journal, December 2017

    • Koch, Mathilde; Duigou, Thomas; Carbonell, Pablo
    • Journal of Cheminformatics, Vol. 9, Issue 1
    • DOI: 10.1186/s13321-017-0252-9

    Expanding Metabolic Capabilities Using Novel Pathway Designs: Computational Tools and Case Studies
    journal, July 2019

    • Saa, Pedro A.; Cortés, María P.; López, Javiera
    • Biotechnology Journal, Vol. 14, Issue 9
    • DOI: 10.1002/biot.201800734

    Identification of small molecules using accurate mass MS/MS search
    journal, April 2017

    • Kind, Tobias; Tsugawa, Hiroshi; Cajka, Tomas
    • Mass Spectrometry Reviews, Vol. 37, Issue 4
    • DOI: 10.1002/mas.21535

    Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics
    journal, November 2017

    • Lai, Zijuan; Tsugawa, Hiroshi; Wohlgemuth, Gert
    • Nature Methods, Vol. 15, Issue 1
    • DOI: 10.1038/nmeth.4512

    SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information
    journal, March 2019


    Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy
    journal, May 2017

    • Blaženović, Ivana; Kind, Tobias; Torbašinović, Hrvoje
    • Journal of Cheminformatics, Vol. 9, Issue 1
    • DOI: 10.1186/s13321-017-0219-x

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics
    journal, May 2018


      Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.