Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

CFM-ID 4.0: More Accurate ESI MS/MS Spectral Prediction and Compound Identification

Journal Article · · Analytical Chemistry

In the field of metabolomics, mass spectrometry (MS) is the method most commonly used for identifying and annotating metabolites. As this typically involves matching a given MS spectrum against an experimentally acquired reference spectral library, this approach is limited by the coverage and size of such libraries (which typically number in the thousands). These experimental libraries can be greatly extended by predicting the MS spectra of known chemical structures (which number in the millions) to create computational reference spectral libraries. To facilitate the generation of predicted spectral reference libraries we developed CFM-ID, a computer program that can accurately predict ESI-MS/MS spectrum for a given compound structure. CFM-ID is one of the best-performing methods for compound-to-mass-spectrum prediction, and also one of the top tools for in silico mass-spectrum-to-compound identification. This work improves CFM-ID’s ability to predict ESI-MS/MS spectra from compounds by: (1) learning parameters from features based on the molecular topology, (2) adding a new approach to ring cleavage that models such cleavage as a sequence of simple chemical bond dissociations and (3) expanding its hand-written rule-based predictor to cover more chemical classes, including acylcarnitines, acylcholines, flavonols, flavones, flavanones, and flavonoid glycosides. We demonstrate that this new version of CFM-ID (version 4.0) is significantly more accurate than previous CFM-ID versions, in terms of both EI-MS/MS spectral prediction and compound identification. CFM-ID 4.0 is available at http://cfmid4.wishartlab.com/ as a webservice and docker images can be downloaded at https://hub.docker.com/r/wishartlab/cfmid

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1824327
Report Number(s):
PNNL-SA-160966
Journal Information:
Analytical Chemistry, Vol. 93, Issue 34
Country of Publication:
United States
Language:
English

References (36)

FiehnLib: Mass Spectral and Retention Index Libraries for Metabolomics Based on Quadrupole and Time-of-Flight Gas Chromatography/Mass Spectrometry December 2009
Mass Spectral Reference Libraries: An Ever-Expanding Resource for Chemical Identification July 2012
An accelerated workflow for untargeted metabolomics using the METLIN database September 2012
Computational mass spectrometry for small-molecule fragmentation January 2014
Identification of small molecules using accurate mass MS/MS search April 2017
Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics May 2012
Computational Metabolomics: A Framework for the Million Metabolome October 2016
METLIN: A Technology Platform for Identifying Knowns and Unknowns January 2018
Advances in computational metabolomics and databases deepen the understanding of metabolisms December 2018
Metabolite identification and molecular fingerprint prediction through machine learning July 2012
Metabolite identification through multiple kernel learning on fragmentation trees June 2014
Fast metabolite identification with Input Output Kernel Regression June 2016
SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information March 2019
Metabolite Identification through Machine Learning— Tackling CASMI Challenge Using FingerID June 2013
Searching molecular structure databases with tandem mass spectra using CSI:FingerID September 2015
Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints June 2018
Critical Assessment of Small Molecule Identification 2016: automated methods March 2017
MetFrag relaunched: incorporating strategies beyond in silico fragmentation January 2016
LipidBlast in silico tandem mass spectrometry database for lipid identification June 2013
Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification June 2014
Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification July 2016
CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification April 2019
Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks March 2019
Prediction of mass spectra from structural information July 1992
MIDAS: A Database-Searching Algorithm for Metabolite Identification in Metabolomics September 2014
Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa January 2014
In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids May 2012
Physicochemical Prediction of Metabolite Fragmentation in Tandem Mass Spectrometry January 2018
Molecular graph convolutions: moving beyond fingerprints August 2016
MoleculeNet: a benchmark for molecular machine learning January 2018
ClassyFire: automated chemical classification with a comprehensive, computable taxonomy November 2016
METLIN: A Metabolite Mass Spectral Database January 2005
Optimization and testing of mass spectral library search algorithms for compound identification September 1994
HMDB 4.0: the human metabolome database for 2018 November 2017
Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software August 2016
Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics November 2017