Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection

Journal Article · · Journal of Proteome Research
 [1];  [2];  [1];  [1];  [3];  [4];  [5];  [4];  [4];  [6];  [6];  [6];  [6];  [4];  [1]
  1. Université Paris Cite (France). Institut Pasteur; Centre National de la Recherche Scientifique (CNRS) (France)
  2. Univ. of Tubingen (Germany)
  3. Univ. of Wisconsin, Madison, WI (United States)
  4. Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
  5. Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
  6. Univ. of Idaho, Moscow, ID (United States)
Generating top-down tandem mass spectra (MS/MS) for complex mixtures of proteoforms has become possible through improvements in fractionation, on-line separation, dissociation, and mass analysis. The algorithms to match tandem mass spectra to sequences have undergone a parallel evolution, with both spectral alignment and peak matching being paired with diverse methods for scoring proteoform-spectral matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification through three distinct challenges. The first is identifying a large yield of PrSMs while controlling false discovery rate (FDR) in identifying thousands of proteoforms from complex cell lysates via four software workflows: ProSight Proteome Discoverer, TopPIC, Informed Proteomics, and pTop. The second is the deconvolution of data from both Thermo Orbitrap-class and Bruker maXis Q-TOF instruments to produce consistent precursor charge and mass determinations while generating fragment mass lists to optimize identification. The third attempts to detect diverse post-translational modifications (PTMs) in proteoforms from cow milk and human ovarian tissue. The data demonstrate that existing software suites produce admirable sensitivity, in some cases identifying a third of collected tandem mass spectra with FDR controlled below 2%; the overlap in these PrSMs, however, illustrates real value in searching data with multiple search engines. Differences among identification workflows seem to result from each search algorithm incorporating its own deconvolution algorithm. By transmitting deconvolution data from multiple deconvolution routes (Thermo Xtract, Bruker Auto MSn, Mascot Distiller, TopFD, and FLASHDeconv) to the downstream TopPIC search algorithm, we were able to detect common causes of deconvolution disagreement. The detection of PTMs was very inconsistent among search algorithms, with some workflows suggesting as little as 1% of PrSMs from cow’s milk were singly-phosphorylated while other workflows found that 18% of PrSMs were singly-phosphorylated. Taken together, these results make a strong argument for top-down researchers to adopt a standard practice of analyzing each MS/MS experiment with at least two different search engines.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Organization:
European Union (EU); National Institutes of Health (NIH); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1988433
Report Number(s):
PNNL-SA-178802
Journal Information:
Journal of Proteome Research, Journal Name: Journal of Proteome Research Journal Issue: 7 Vol. 22; ISSN 1535-3893
Publisher:
American Chemical Society (ACS)Copyright Statement
Country of Publication:
United States
Language:
English

References (66)

Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker journal March 2012
Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines journal March 2009
The first pilot project of the consortium for top-down proteomics: A status report journal April 2014
De Novo Sequencing of Peptides from High-Resolution Bottom-Up Tandem Mass Spectra using Top-Down Intended Methods journal December 2017
INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results journal June 2021
Discovery of Unknown Posttranslational Modifications by Top-Down Mass Spectrometry book January 2022
Accurate Proteoform Identification and Quantitation Using pTop 2.0 book January 2022
Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong journal May 2011
FLASHDeconv: Ultrafast, High-Quality Feature Deconvolution for Top-Down Proteomics journal February 2020
Combined Mass Spectrometry Imaging and Top-down Microproteomics Reveals Evidence of a Hidden Proteome in Ovarian Cancer journal July 2017
Validated MALDI-TOF/TOF mass spectra for protein standards journal May 2007
Precursor ion independent algorithm for top-down shotgun proteomics journal November 2009
Protease proteomics: Revealing protease in vivo functions using systems biology approaches journal October 2008
Capillary zone electrophoresis-mass spectrometry for top-down proteomics journal November 2019
Top-down mass spectrometry of histone modifications in sorghum reveals potential epigenetic markers for drought acclimation journal December 2020
Web and Database Software for Identification of Intact Proteins Using “Top Down” Mass Spectrometry journal June 2003
Interpreting Top-Down Mass Spectra Using Spectral Alignment journal February 2008
Top-Down Proteomics of Endogenous Membrane Proteins Enabled by Cloud Point Enrichment and Multidimensional Liquid Chromatography–Mass Spectrometry journal November 2020
Automated Capillary Isoelectric Focusing-Tandem Mass Spectrometry for Qualitative and Quantitative Top-Down Proteomics journal December 2020
Bayesian Deconvolution of Mass and Ion Mobility Spectra: From Binary Interactions to Polydisperse Ensembles journal April 2015
pTop 1.0: A High-Accuracy and High-Efficiency Search Engine for Intact Protein Identification journal February 2016
Optimization of a Top-Down Proteomics Platform for Closely Related Pathogenic Bacterial Discrimination journal September 2020
Spritz: A Proteogenomic Database Engine journal September 2020
Proteomics Standards Initiative’s ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms journal March 2022
Quantitative Proteomics and Immunohistochemistry Reveal Insights into Cellular and Molecular Processes in the Infarct Border Zone One Month after Myocardial Infarction journal April 2017
ProForma: A Standard Proteoform Notation journal February 2018
Enhanced Global Post-translational Modification Discovery with MetaMorpheus journal March 2018
Capillary Zone Electrophoresis-Electron-Capture Collision-Induced Dissociation on a Quadrupole Time-of-Flight Mass Spectrometer for Top-Down Characterization of Intact Proteins journal March 2021
The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools journal January 2008
MGFp: An Open Mascot Generic Format Parser Library Implementation journal March 2010
In-Source Fragmentation and the Sources of Partially Tryptic Peptides in Shotgun Proteomics journal January 2013
Top-down Targeted Proteomics for Deep Sequencing of Tropomyosin Isoforms journal December 2012
A Ranking-Based Scoring Function for Peptide−Spectrum Matches journal March 2009
Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography−Tandem Mass Spectrometry journal February 2010
Informatics and multiplexing of intact protein identification in bacteria and the archaea journal October 2001
MS-GF+ makes progress towards a universal database search tool for proteomics journal October 2014
Proteoform: a single term describing protein complexity journal February 2013
OpenMS: a flexible open-source software platform for mass spectrometry data analysis journal August 2016
MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics journal April 2017
Informed-Proteomics: open-source software package for top-down proteomics journal August 2017
Systematic quantitative analysis of ribosome inventory during nutrient stress journal July 2020
A photocleavable surfactant for top-down proteomics journal April 2019
A five-level classification system for proteoform identifications journal August 2019
Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins: A COMBINATORIAL APPROACH journal September 2010
Protein Identification Using Top-Down Spectra journal October 2011
The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results journal February 2012
Abundant Lysine Methylation and N-Terminal Acetylation in Sulfolobus islandicus Revealed by Bottom-Up and Top-Down Proteomics journal November 2016
Toward the Complete Membrane Proteome journal March 2006
The Proteomics of N-terminal Methionine Cleavage journal September 2006
Sensitive and Specific Identification of Wild Type and Variant Proteins from 8 to 669 kDa Using Top-down Mass Spectrometry journal April 2009
The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience journal June 2014
The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses* journal December 2015
MASH Suite Pro: A Comprehensive Software Tool for Top-Down Proteomics journal November 2015
mzML—a Community Standard for Mass Spectrometry Data journal August 2010
Combining Results of Multiple Search Engines in Proteomics journal May 2013
Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics journal April 2019
ProteoWizard: open source software for rapid proteomics tools development journal July 2008
TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization journal July 2016
A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra journal December 2016
PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics journal May 2021
H1 histones: current perspectives and challenges journal August 2013
A Proteomic View on Genome-Based Signal Peptide Predictions journal September 2001
Proteinortho: Detection of (Co-)orthologs in large-scale analysis journal April 2011
Analysis of Patient Preferences in Lung Cancer – Estimating Acceptable Tradeoffs Between Treatment Benefit and Side Effects journal June 2020
Milk Bottom-Up Proteomics: Method Optimization journal January 2016
Optimisation of Milk Protein Top-Down Sequencing Using In-Source Collision-Induced Dissociation in the Maxis Quadrupole Time-of-Flight Mass Spectrometer journal October 2018

Similar Records

Characterization of Proteoforms with Unknown Post-translational Modifications Using the MIScore
Journal Article · Fri Aug 05 00:00:00 EDT 2016 · Journal of Proteome Research · OSTI ID:1324898

Profiling of Histone Post-Translational Modifications in Mouse Brain with High-Resolution Top-Down Mass Spectrometry
Journal Article · Tue Dec 20 23:00:00 EST 2016 · Journal of Proteome Research · OSTI ID:1349168