Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates
Abstract
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
- Authors:
-
- Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics
- Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biomedical Informatics; Mayo Clinic, Rochester, NY (United States). Div. of Biomedical Statistics and Informatics
- Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Biochemistry
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Environmental Molecular Sciences Lab. (EMSL)
- Vanderbilt Univ. Medical Center, Nashville, TN (United States). Dept. of Pharmacology
- Vanderbilt Univ. Medical Center, Nashville, TN (United States). Depts. of Biomedical Informatics, Biochemistry and Vanderbilt-Ingram Cancer Center
- Publication Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States). Environmental Molecular Sciences Laboratory (EMSL)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1095420
- Report Number(s):
- PNNL-SA-98549
Journal ID: ISSN 1672-0229; 47418; KP1601010
- Grant/Contract Number:
- AC05-76RL01830
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Genomics, Proteomics & Bioinformatics
- Additional Journal Information:
- Journal Volume: 11; Journal Issue: 2; Journal ID: ISSN 1672-0229
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 59 BASIC BIOLOGICAL SCIENCES; Environmental Molecular Sciences Laboratory
Citation Formats
Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., and Tabb, David L. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates. United States: N. p., 2013.
Web. doi:10.1016/j.gpb.2012.11.004.
Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., & Tabb, David L. Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates. United States. https://doi.org/10.1016/j.gpb.2012.11.004
Wang, Dong, Dasari, Surendra, Chambers, Matthew C., Holman, Jerry D., Chen, Kan, Liebler, Daniel, Orton, Daniel J., Purvine, Samuel O., Monroe, Matthew E., Chung, Chang Y., Rose, Kristie L., and Tabb, David L. Thu .
"Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates". United States. https://doi.org/10.1016/j.gpb.2012.11.004. https://www.osti.gov/servlets/purl/1095420.
@article{osti_1095420,
title = {Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates},
author = {Wang, Dong and Dasari, Surendra and Chambers, Matthew C. and Holman, Jerry D. and Chen, Kan and Liebler, Daniel and Orton, Daniel J. and Purvine, Samuel O. and Monroe, Matthew E. and Chung, Chang Y. and Rose, Kristie L. and Tabb, David L.},
abstractNote = {In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.},
doi = {10.1016/j.gpb.2012.11.004},
journal = {Genomics, Proteomics & Bioinformatics},
number = 2,
volume = 11,
place = {United States},
year = {Thu Mar 07 00:00:00 EST 2013},
month = {Thu Mar 07 00:00:00 EST 2013}
}
Works referenced in this record:
TagRecon: High-Throughput Mutation Identification through Sequence Tagging
journal, April 2010
- Dasari, Surendra; Chambers, Matthew C.; Slebos, Robbert J.
- Journal of Proteome Research, Vol. 9, Issue 4
Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction
journal, January 2007
- Sun, Shaojun; Meyer-Arendt, Karen; Eichelberger, Brian
- Molecular & Cellular Proteomics, Vol. 6, Issue 1
Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage
journal, January 2004
- Paizs, Béla.; Suhai, Sándor
- Journal of the American Society for Mass Spectrometry, Vol. 15, Issue 1
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008
- Käll, Lukas; Storey, John D.; MacCoss, Michael J.
- Journal of Proteome Research, Vol. 7, Issue 1
Towards understanding some ion intensity relationships for the tandem mass spectra of protonated peptides
journal, January 2002
- Paizs, B�la; Suhai, S�ndor
- Rapid Communications in Mass Spectrometry, Vol. 16, Issue 17
Mass spectrometry-based proteomics
journal, March 2003
- Aebersold, Ruedi; Mann, Matthias
- Nature, Vol. 422, Issue 6928
Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation
journal, October 2003
- Kapp, Eugene A.; Schütz, Frédéric; Reid, Gavin E.
- Analytical Chemistry, Vol. 75, Issue 22
Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges
journal, August 2005
- Zhang, Zhongqi
- Analytical Chemistry, Vol. 77, Issue 19
Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment
journal, January 2012
- Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.
- Journal of Proteome Research, Vol. 11, Issue 3
MyriMatch: Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric Analysis
journal, February 2007
- Tabb, David L.; Fernando, Christopher G.; Chambers, Matthew C.
- Journal of Proteome Research, Vol. 6, Issue 2
ProteoWizard: open source software for rapid proteomics tools development
journal, July 2008
- Kessner, Darren; Chambers, Matt; Burke, Robert
- Bioinformatics, Vol. 24, Issue 21
Deriving statistical models for predicting peptide tandem MS product ion intensities
journal, December 2003
- Schütz, F.; Kapp, E. A.; Simpson, R. J.
- Biochemical Society Transactions, Vol. 31, Issue 6
pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra
journal, May 2010
- Chi, Hao; Sun, Rui-Xiang; Yang, Bing
- Journal of Proteome Research, Vol. 9, Issue 5
SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry
journal, April 2011
- Li, Wenzhou; Ji, Li; Goya, Jonathan
- Journal of Proteome Research, Vol. 10, Issue 4
Proteomic Analysis of Chinese Hamster Ovary Cells
journal, October 2012
- Baycin-Hizal, Deniz; Tabb, David L.; Chaerkady, Raghothama
- Journal of Proteome Research, Vol. 11, Issue 11
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
journal, November 1994
- Eng, Jimmy K.; McCormack, Ashley L.; Yates, John R.
- Journal of the American Society for Mass Spectrometry, Vol. 5, Issue 11
Fragmentation Pathways of Protonated Peptides
journal, May 2006
- Paizs, Bela; Suhai, Sandor
- ChemInform, Vol. 37, Issue 21
Mass spectrometry-based proteomics
journal, March 2003
- Aebersold, Ruedi; Mann, Matthias
- Nature, Vol. 422, Issue 6928
Large-scale analysis of the yeast proteome by multidimensional protein identification technology
journal, March 2001
- Washburn, Michael P.; Wolters, Dirk; Yates, John R.
- Nature Biotechnology, Vol. 19, Issue 3
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
journal, November 1994
- Eng, Jimmy K.; McCormack, Ashley L.; Yates, John R.
- Journal of the American Society for Mass Spectrometry, Vol. 5, Issue 11
Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation
journal, October 2003
- Kapp, Eugene A.; Schütz, Frédéric; Reid, Gavin E.
- Analytical Chemistry, Vol. 75, Issue 22
Deriving statistical models for predicting peptide tandem MS product ion intensities
journal, December 2003
- Schütz, F.; Kapp, E. A.; Simpson, R. J.
- Biochemical Society Transactions, Vol. 31, Issue 6
Intensity-based protein identification by machine learning from a library of tandem mass spectra
journal, January 2004
- Elias, Joshua E.; Gibbons, Francis D.; King, Oliver D.
- Nature Biotechnology, Vol. 22, Issue 2
Predicting Intensity Ranks of Peptide Fragment Ions
journal, April 2009
- Frank, Ari M.
- Journal of Proteome Research, Vol. 8, Issue 5
Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides
journal, July 2004
- Zhang, Zhongqi
- Analytical Chemistry, Vol. 76, Issue 14
Prediction of Low-Energy Collision-Induced Dissociation Spectra of Peptides with Three or More Charges
journal, August 2005
- Zhang, Zhongqi
- Analytical Chemistry, Vol. 77, Issue 19
On the Accuracy and Limits of Peptide Fragmentation Spectrum Prediction
journal, February 2011
- Li, Sujun; Arnold, Randy J.; Tang, Haixu
- Analytical Chemistry, Vol. 83, Issue 3
Fragmentation pathways of protonated peptides
journal, January 2005
- Paizs, Béla; Suhai, Sándor
- Mass Spectrometry Reviews, Vol. 24, Issue 4
Sequence Dependence of Peptide Fragmentation Efficiency Curves Determined by Electrospray Ionization/Surface-Induced Dissociation Mass Spectrometry
journal, September 1994
- Jones, Jennifer L.; Dongre, Ashok R.; Somogyi, Arpad
- Journal of the American Chemical Society, Vol. 116, Issue 18
Influence of Peptide Composition, Gas-Phase Basicity, and Chemical Modification on Fragmentation Efficiency: Evidence for the Mobile Proton Model
journal, January 1996
- Dongré, Ashok R.; Jones, Jennifer L.; Somogyi, Árpád
- Journal of the American Chemical Society, Vol. 118, Issue 35
Mobile and localized protons: a framework for understanding peptide dissociation
journal, December 2000
- Wysocki, Vicki H.; Tsaprailis, George; Smith, Lori L.
- Journal of Mass Spectrometry, Vol. 35, Issue 12
MyriMatch: Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric Analysis
journal, February 2007
- Tabb, David L.; Fernando, Christopher G.; Chambers, Matthew C.
- Journal of Proteome Research, Vol. 6, Issue 2
Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment
journal, January 2012
- Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.
- Journal of Proteome Research, Vol. 11, Issue 3
TagRecon: High-Throughput Mutation Identification through Sequence Tagging
journal, April 2010
- Dasari, Surendra; Chambers, Matthew C.; Slebos, Robbert J.
- Journal of Proteome Research, Vol. 9, Issue 4
Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography−Tandem Mass Spectrometry
journal, February 2010
- Tabb, David L.; Vega-Montoto, Lorenzo; Rudnick, Paul A.
- Journal of Proteome Research, Vol. 9, Issue 2
pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra
journal, May 2010
- Chi, Hao; Sun, Rui-Xiang; Yang, Bing
- Journal of Proteome Research, Vol. 9, Issue 5
Proteomic Analysis of Chinese Hamster Ovary Cells
journal, October 2012
- Baycin-Hizal, Deniz; Tabb, David L.; Chaerkady, Raghothama
- Journal of Proteome Research, Vol. 11, Issue 11
ProteoWizard: open source software for rapid proteomics tools development
journal, July 2008
- Kessner, Darren; Chambers, Matt; Burke, Robert
- Bioinformatics, Vol. 24, Issue 21
Identifying Proteomic LC‐MS/MS Data Sets with Bumbershoot and IDPicker
journal, March 2012
- Holman, Jerry D.; Ma, Ze‐Qiang; Tabb, David L.
- Current Protocols in Bioinformatics, Vol. 37, Issue 1
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008
- Käll, Lukas; Storey, John D.; MacCoss, Michael J.
- Journal of Proteome Research, Vol. 7, Issue 1
Towards understanding some ion intensity relationships for the tandem mass spectra of protonated peptides
journal, January 2002
- Paizs, B�la; Suhai, S�ndor
- Rapid Communications in Mass Spectrometry, Vol. 16, Issue 17
Towards understanding the tandem mass spectra of protonated oligopeptides. 1: Mechanism of amide bond cleavage
journal, January 2004
- Paizs, Béla.; Suhai, Sándor
- Journal of the American Society for Mass Spectrometry, Vol. 15, Issue 1
SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry
journal, April 2011
- Li, Wenzhou; Ji, Li; Goya, Jonathan
- Journal of Proteome Research, Vol. 10, Issue 4
Expediting the Development of Targeted SRM Assays: Using Data from Shotgun Proteomics to Automate Method Development
journal, June 2009
- Prakash, Amol; Tomazela, Daniela M.; Frewen, Barbara
- Journal of Proteome Research, Vol. 8, Issue 6