skip to main content

DOE PAGESDOE PAGES

Title: Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
Authors:
 [1] ;  [1] ;  [1] ;  [2] ;  [3] ;  [3] ;  [4] ;  [4] ;  [4] ;  [5] ;  [3] ;  [6]
  1. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics
  2. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics; Mayo Clinic, Rochester, NY (United States). Div. of Biomedical Statistics and Informatics
  3. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biochemistry
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL)
  5. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Pharmacology
  6. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Depts. of Biomedical Informatics, Biochemistry and Vanderbilt-Ingram Cancer Center
Publication Date:
OSTI Identifier:
1095420
Report Number(s):
PNNL--SA-98549
Journal ID: ISSN 1672-0229; 47418; KP1601010
Grant/Contract Number:
AC05-76RL01830
Type:
Accepted Manuscript
Journal Name:
Genomics, Proteomics & Bioinformatics
Additional Journal Information:
Journal Volume: 11; Journal Issue: 2; Journal ID: ISSN 1672-0229
Publisher:
Elsevier
Research Org:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Org:
USDOE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES Environmental Molecular Sciences Laboratory