Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics
Proteomics is a rapidly advancing field offering a new perspective to biological systems. Mass spectrometry (MS) is a popular experimental approach because it allows global protein characterization of a sample in a high-throughput manner. The identification of a protein is based on the spectral signature of fragments of the constituent proteins, i.e., peptides. This peptide identification is typically performed with a computational database search algorithm; however, these database search algorithms return a large number of false positive identifications. We present a new scoring algorithm that uses a SVM to integrate database scoring metrics with peptide physiochemical properties, resulting in an improved ability to separate true from false peptide identification from MS. The Peptide Identification Classifier SVM (PICS) score using only five variables is significantly more accurate than the single best database metric, quantified as the area under a Receive Operating Characteristic curve of ~0.94 versus ~0.90.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 928287
- Report Number(s):
- PNNL-SA-58675; KJ0102000; TRN: US200815%%811
- Resource Relation:
- Conference: The Sixth International Conference on Machine Learning and Applications (ICMLA ’07), 500-505
- Country of Publication:
- United States
- Language:
- English
Similar Records
A robust linear regression based algorithm for automated evaluation of peptide identifications from shotgun proteomics by use of reversed-phase liquid chromatography retention time
Probability-Based Evaluation of Peptide and Protein Identifications from Tandem Mass Spectrometry and SEQUEST Analysis: The Human Proteome