skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

Journal Article · · Genomics, Proteomics & Bioinformatics
 [1];  [1];  [1];  [2];  [3];  [3];  [4];  [4];  [4];  [5];  [3];  [6]
  1. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics
  2. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics; Mayo Clinic, Rochester, NY (United States). Div. of Biomedical Statistics and Informatics
  3. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biochemistry
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL)
  5. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Pharmacology
  6. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Depts. of Biomedical Informatics, Biochemistry and Vanderbilt-Ingram Cancer Center

In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1095420
Report Number(s):
PNNL-SA-98549; 47418; KP1601010
Journal Information:
Genomics, Proteomics & Bioinformatics, Vol. 11, Issue 2; ISSN 1672-0229
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (29)

TagRecon: High-Throughput Mutation Identification through Sequence Tagging journal April 2010
Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction journal January 2007
Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage journal January 2004
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin journal January 2008
Towards understanding some ion intensity relationships for the tandem mass spectra of protonated peptides journal January 2002
Mass spectrometry-based proteomics journal March 2003
Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation journal October 2003
Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges journal August 2005
Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment journal January 2012
MyriMatch:  Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric Analysis journal February 2007
ProteoWizard: open source software for rapid proteomics tools development journal July 2008
Deriving statistical models for predicting peptide tandem MS product ion intensities journal December 2003
pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra journal May 2010
SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry journal April 2011
Proteomic Analysis of Chinese Hamster Ovary Cells journal October 2012
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database journal November 1994
Fragmentation Pathways of Protonated Peptides journal May 2006
Large-scale analysis of the yeast proteome by multidimensional protein identification technology journal March 2001
Intensity-based protein identification by machine learning from a library of tandem mass spectra journal January 2004
Predicting Intensity Ranks of Peptide Fragment Ions journal April 2009
Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides journal July 2004
On the Accuracy and Limits of Peptide Fragmentation Spectrum Prediction journal February 2011
Fragmentation pathways of protonated peptides journal January 2005
Sequence Dependence of Peptide Fragmentation Efficiency Curves Determined by Electrospray Ionization/Surface-Induced Dissociation Mass Spectrometry journal September 1994
Influence of Peptide Composition, Gas-Phase Basicity, and Chemical Modification on Fragmentation Efficiency:  Evidence for the Mobile Proton Model journal January 1996
Mobile and localized protons: a framework for understanding peptide dissociation journal December 2000
Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography−Tandem Mass Spectrometry journal February 2010
Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker journal March 2012
Expediting the Development of Targeted SRM Assays: Using Data from Shotgun Proteomics to Automate Method Development journal June 2009