DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

Abstract

In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.

Authors:
 [1];  [1];  [1];  [2];  [3];  [3];  [4];  [4];  [4];  [5];  [3];  [6]
  1. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics
  2. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics; Mayo Clinic, Rochester, NY (United States). Div. of Biomedical Statistics and Informatics
  3. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biochemistry
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL)
  5. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Pharmacology
  6. Vanderbilt Univ. Medical Center, Nashville, TN (United States). Depts. of Biomedical Informatics, Biochemistry and Vanderbilt-Ingram Cancer Center
Publication Date:
Research Org.:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
Sponsoring Org.:
USDOE
OSTI Identifier:
1095420
Report Number(s):
PNNL-SA-98549
Journal ID: ISSN 1672-0229; 47418; KP1601010
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Accepted Manuscript
Journal Name:
Genomics, Proteomics & Bioinformatics
Additional Journal Information:
Journal Volume: 11; Journal Issue: 2; Journal ID: ISSN 1672-0229
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES; Environmental Molecular Sciences Laboratory

Citation Formats

Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., and Tabb, David L. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates. United States: N. p., 2013. Web. doi:10.1016/j.gpb.2012.11.004.
Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., & Tabb, David L. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates. United States. https://doi.org/10.1016/j.gpb.2012.11.004
Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., and Tabb, David L. Thu . "Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates". United States. https://doi.org/10.1016/j.gpb.2012.11.004. https://www.osti.gov/servlets/purl/1095420.
@article{osti_1095420,
title = {Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates},
author = {Wang, Dong and Dasari, Surendra and Chambers, Matthew C. and Holman, Jerry D. and Chen, Kan and Liebler, Daniel and Orton, Daniel J. and Purvine, Samuel O. and Monroe, Matthew E. and Chung, Chang Y. and Rose, Kristie L. and Tabb, David L.},
abstractNote = {In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.},
doi = {10.1016/j.gpb.2012.11.004},
journal = {Genomics, Proteomics & Bioinformatics},
number = 2,
volume = 11,
place = {United States},
year = {Thu Mar 07 00:00:00 EST 2013},
month = {Thu Mar 07 00:00:00 EST 2013}
}

Works referenced in this record:

TagRecon: High-Throughput Mutation Identification through Sequence Tagging
journal, April 2010

  • Dasari, Surendra; Chambers, Matthew C.; Slebos, Robbert J.
  • Journal of Proteome Research, Vol. 9, Issue 4
  • DOI: 10.1021/pr900850m

Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction
journal, January 2007

  • Sun, Shaojun; Meyer-Arendt, Karen; Eichelberger, Brian
  • Molecular & Cellular Proteomics, Vol. 6, Issue 1
  • DOI: 10.1074/mcp.m600320-mcp200

Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage
journal, January 2004

  • Paizs, Béla.; Suhai, Sándor
  • Journal of the American Society for Mass Spectrometry, Vol. 15, Issue 1
  • DOI: 10.1016/j.jasms.2003.09.010

Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700739d

Towards understanding some ion intensity relationships for the tandem mass spectra of protonated peptides
journal, January 2002

  • Paizs, B�la; Suhai, S�ndor
  • Rapid Communications in Mass Spectrometry, Vol. 16, Issue 17
  • DOI: 10.1002/rcm.747

Mass spectrometry-based proteomics
journal, March 2003


Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation
journal, October 2003

  • Kapp, Eugene A.; Schütz, Frédéric; Reid, Gavin E.
  • Analytical Chemistry, Vol. 75, Issue 22
  • DOI: 10.1021/ac034616t

Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges
journal, August 2005


Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment
journal, January 2012

  • Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.
  • Journal of Proteome Research, Vol. 11, Issue 3
  • DOI: 10.1021/pr200874e

MyriMatch:  Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric Analysis
journal, February 2007

  • Tabb, David L.; Fernando, Christopher G.; Chambers, Matthew C.
  • Journal of Proteome Research, Vol. 6, Issue 2
  • DOI: 10.1021/pr0604054

ProteoWizard: open source software for rapid proteomics tools development
journal, July 2008


Deriving statistical models for predicting peptide tandem MS product ion intensities
journal, December 2003

  • Schütz, F.; Kapp, E. A.; Simpson, R. J.
  • Biochemical Society Transactions, Vol. 31, Issue 6
  • DOI: 10.1042/bst0311479

pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra
journal, May 2010

  • Chi, Hao; Sun, Rui-Xiang; Yang, Bing
  • Journal of Proteome Research, Vol. 9, Issue 5
  • DOI: 10.1021/pr100182k

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry
journal, April 2011

  • Li, Wenzhou; Ji, Li; Goya, Jonathan
  • Journal of Proteome Research, Vol. 10, Issue 4
  • DOI: 10.1021/pr100959y

Proteomic Analysis of Chinese Hamster Ovary Cells
journal, October 2012

  • Baycin-Hizal, Deniz; Tabb, David L.; Chaerkady, Raghothama
  • Journal of Proteome Research, Vol. 11, Issue 11
  • DOI: 10.1021/pr300476w

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
journal, November 1994

  • Eng, Jimmy K.; McCormack, Ashley L.; Yates, John R.
  • Journal of the American Society for Mass Spectrometry, Vol. 5, Issue 11
  • DOI: 10.1016/1044-0305(94)80016-2

Fragmentation Pathways of Protonated Peptides
journal, May 2006


Mass spectrometry-based proteomics
journal, March 2003


Large-scale analysis of the yeast proteome by multidimensional protein identification technology
journal, March 2001

  • Washburn, Michael P.; Wolters, Dirk; Yates, John R.
  • Nature Biotechnology, Vol. 19, Issue 3
  • DOI: 10.1038/85686

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
journal, November 1994

  • Eng, Jimmy K.; McCormack, Ashley L.; Yates, John R.
  • Journal of the American Society for Mass Spectrometry, Vol. 5, Issue 11
  • DOI: 10.1016/1044-0305(94)80016-2

Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation
journal, October 2003

  • Kapp, Eugene A.; Schütz, Frédéric; Reid, Gavin E.
  • Analytical Chemistry, Vol. 75, Issue 22
  • DOI: 10.1021/ac034616t

Deriving statistical models for predicting peptide tandem MS product ion intensities
journal, December 2003

  • Schütz, F.; Kapp, E. A.; Simpson, R. J.
  • Biochemical Society Transactions, Vol. 31, Issue 6
  • DOI: 10.1042/bst0311479

Intensity-based protein identification by machine learning from a library of tandem mass spectra
journal, January 2004

  • Elias, Joshua E.; Gibbons, Francis D.; King, Oliver D.
  • Nature Biotechnology, Vol. 22, Issue 2
  • DOI: 10.1038/nbt930

Predicting Intensity Ranks of Peptide Fragment Ions
journal, April 2009


Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides
journal, July 2004


Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges
journal, August 2005


On the Accuracy and Limits of Peptide Fragmentation Spectrum Prediction
journal, February 2011

  • Li, Sujun; Arnold, Randy J.; Tang, Haixu
  • Analytical Chemistry, Vol. 83, Issue 3
  • DOI: 10.1021/ac102272r

Fragmentation pathways of protonated peptides
journal, January 2005

  • Paizs, Béla; Suhai, Sándor
  • Mass Spectrometry Reviews, Vol. 24, Issue 4
  • DOI: 10.1002/mas.20024

Sequence Dependence of Peptide Fragmentation Efficiency Curves Determined by Electrospray Ionization/Surface-Induced Dissociation Mass Spectrometry
journal, September 1994

  • Jones, Jennifer L.; Dongre, Ashok R.; Somogyi, Arpad
  • Journal of the American Chemical Society, Vol. 116, Issue 18
  • DOI: 10.1021/ja00097a055

Influence of Peptide Composition, Gas-Phase Basicity, and Chemical Modification on Fragmentation Efficiency:  Evidence for the Mobile Proton Model
journal, January 1996

  • Dongré, Ashok R.; Jones, Jennifer L.; Somogyi, Árpád
  • Journal of the American Chemical Society, Vol. 118, Issue 35
  • DOI: 10.1021/ja9542193

Mobile and localized protons: a framework for understanding peptide dissociation
journal, December 2000


MyriMatch:  Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric Analysis
journal, February 2007

  • Tabb, David L.; Fernando, Christopher G.; Chambers, Matthew C.
  • Journal of Proteome Research, Vol. 6, Issue 2
  • DOI: 10.1021/pr0604054

Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment
journal, January 2012

  • Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.
  • Journal of Proteome Research, Vol. 11, Issue 3
  • DOI: 10.1021/pr200874e

TagRecon: High-Throughput Mutation Identification through Sequence Tagging
journal, April 2010

  • Dasari, Surendra; Chambers, Matthew C.; Slebos, Robbert J.
  • Journal of Proteome Research, Vol. 9, Issue 4
  • DOI: 10.1021/pr900850m

Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography−Tandem Mass Spectrometry
journal, February 2010

  • Tabb, David L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.
  • Journal of Proteome Research, Vol. 9, Issue 2
  • DOI: 10.1021/pr9006365

pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra
journal, May 2010

  • Chi, Hao; Sun, Rui-Xiang; Yang, Bing
  • Journal of Proteome Research, Vol. 9, Issue 5
  • DOI: 10.1021/pr100182k

Proteomic Analysis of Chinese Hamster Ovary Cells
journal, October 2012

  • Baycin-Hizal, Deniz; Tabb, David L.; Chaerkady, Raghothama
  • Journal of Proteome Research, Vol. 11, Issue 11
  • DOI: 10.1021/pr300476w

ProteoWizard: open source software for rapid proteomics tools development
journal, July 2008


Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker
journal, March 2012


Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700739d

Towards understanding some ion intensity relationships for the tandem mass spectra of protonated peptides
journal, January 2002

  • Paizs, B�la; Suhai, S�ndor
  • Rapid Communications in Mass Spectrometry, Vol. 16, Issue 17
  • DOI: 10.1002/rcm.747

Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage
journal, January 2004

  • Paizs, Béla.; Suhai, Sándor
  • Journal of the American Society for Mass Spectrometry, Vol. 15, Issue 1
  • DOI: 10.1016/j.jasms.2003.09.010

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry
journal, April 2011

  • Li, Wenzhou; Ji, Li; Goya, Jonathan
  • Journal of Proteome Research, Vol. 10, Issue 4
  • DOI: 10.1021/pr100959y

Expediting the Development of Targeted SRM Assays: Using Data from Shotgun Proteomics to Automate Method Development
journal, June 2009

  • Prakash, Amol; Tomazela, Daniela M.; Frewen, Barbara
  • Journal of Proteome Research, Vol. 8, Issue 6
  • DOI: 10.1021/pr801028b