skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics

Abstract

The abundance ratio between the light and heavy iso-topologues of an isotopically labeled peptide can be estimated from their selected ion chromatograms. How-ever, quantitative shotgun proteomics measurements yield selected ion chromatograms at highly variable signal-to-noise ratios for tens of thousands of peptides. This challenge calls for algorithms that not only robustly estimate the abundance ratios of different peptides but also rigorously score each abundance ratio for the expected estimation bias and variability. Scoring of the abundance ratios, much like scoring of sequence assignment for tandem mass spectra by peptide identification algorithms, enables filtering of unreliable peptide quantification and use of formal statistical inference in the subsequent protein abundance ratio estimation. In this study, aparallel paired covariance algorithm is used for robust peak detection in selected ion chromatograms. A peak profile is generated for each peptide, which is a scatter plot of ion intensities measured for the two isotopologues with in their chromatographic peaks. Principal component analysis of the peak profile is proposed to estimate the peptide abundance ratio and to score the estimation with the signal-to-noise ratio of the peak profile (profile signal-to-noise ratio). We demonstrate that the profile signal-to-noise ratio is inversely correlated with the variability and bias ofmore » peptide abundance ratio estimation.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [2];  [3];  [1]
  1. ORNL
  2. {Greg} B [ORNL
  3. {Bob} L [ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
OSTI Identifier:
1003641
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Journal Article
Resource Relation:
Journal Name: Analytical Chemistry; Journal Volume: 78; Journal Issue: 20
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ABUNDANCE; ALGORITHMS; DETECTION; MASS SPECTRA; PEPTIDES; PROTEINS; SIGNAL-TO-NOISE RATIO

Citation Formats

Pan, Chongle, Kora, Guruprasad H, Tabb, Dave L, Pelletier, Dale A, McDonald, W Hayes, Hurst, Gregory, Hettich, Robert, and Samatova, Nagiza F. Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics. United States: N. p., 2006. Web. doi:10.1021/ac0606554.
Pan, Chongle, Kora, Guruprasad H, Tabb, Dave L, Pelletier, Dale A, McDonald, W Hayes, Hurst, Gregory, Hettich, Robert, & Samatova, Nagiza F. Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics. United States. doi:10.1021/ac0606554.
Pan, Chongle, Kora, Guruprasad H, Tabb, Dave L, Pelletier, Dale A, McDonald, W Hayes, Hurst, Gregory, Hettich, Robert, and Samatova, Nagiza F. Sun . "Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics". United States. doi:10.1021/ac0606554.
@article{osti_1003641,
title = {Robust Estimation of Peptide Abundance Ratios and Rigorous Scoring of Their Variability and Bias in Quantitative Shotgun Proteomics},
author = {Pan, Chongle and Kora, Guruprasad H and Tabb, Dave L and Pelletier, Dale A and McDonald, W Hayes and Hurst, Gregory and Hettich, Robert and Samatova, Nagiza F},
abstractNote = {The abundance ratio between the light and heavy iso-topologues of an isotopically labeled peptide can be estimated from their selected ion chromatograms. How-ever, quantitative shotgun proteomics measurements yield selected ion chromatograms at highly variable signal-to-noise ratios for tens of thousands of peptides. This challenge calls for algorithms that not only robustly estimate the abundance ratios of different peptides but also rigorously score each abundance ratio for the expected estimation bias and variability. Scoring of the abundance ratios, much like scoring of sequence assignment for tandem mass spectra by peptide identification algorithms, enables filtering of unreliable peptide quantification and use of formal statistical inference in the subsequent protein abundance ratio estimation. In this study, aparallel paired covariance algorithm is used for robust peak detection in selected ion chromatograms. A peak profile is generated for each peptide, which is a scatter plot of ion intensities measured for the two isotopologues with in their chromatographic peaks. Principal component analysis of the peak profile is proposed to estimate the peptide abundance ratio and to score the estimation with the signal-to-noise ratio of the peak profile (profile signal-to-noise ratio). We demonstrate that the profile signal-to-noise ratio is inversely correlated with the variability and bias of peptide abundance ratio estimation.},
doi = {10.1021/ac0606554},
journal = {Analytical Chemistry},
number = 20,
volume = 78,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2006},
month = {Sun Jan 01 00:00:00 EST 2006}
}
  • A profile likelihood algorithm is proposed for quantitative shotgun proteomics to infer the abundance ratios of proteins from the abundance ratios of isotopically labeled peptides derived from proteolysis. Previously, we have shown that the estimation variability and bias of peptide abundance ratios can be predicted from their profile signal-to-noise ratios. Given multiple quantified peptides for a protein, the profile likelihood algorithm probabilistically weighs the peptide abundance ratios by their inferred estimation variability, accounts for their expected estimation bias, and suppresses contribution from outliers. This algorithm yields maximum likelihood point estimation and profile likelihood confidence interval estimation of protein abundance ratios.more » This point estimator is more accurate than an estimator based on the average of peptide abundance ratios. The confidence interval estimation provides an "error bar" for each protein abundance ratio that reflects its estimation precision and statistical uncertainty. The accuracy of the point estimation and the precision and confidence level of the interval estimation were benchmarked with standard mixtures of isotopically labeled proteomes. The profile likelihood algorithm was integrated into a quantitative proteomics program, called ProRata, freely available at www.MSProRata.org.« less
  • Mass spectrometric analysis of Caldicellulosiruptor obsidiansis cultures grown on four different carbon sources identified 65% of the cells predicted proteins in cell lysates and supernatants. Biological and technical replication together with sophisticated statistical analysis were used to reliably quantify protein abundances and their changes as a function of carbon source. Extracellular, multifunctional glycosidases were significantly more abundant on cellobiose than on the crystalline cellulose substrates Avicel and filter paper, indicating either disaccharide induction or constitutive protein expression. Highly abundant flagellar, chemotaxis, and pilus proteins were detected during growth on insoluble substrates, suggesting motility or specific substrate attachment. The highly abundantmore » extracellular binding protein COB47-0549 together with the COB47-1616 ATPase might comprise the primary ABC-transport system for cellooligosaccharides, while COB47-0096 and COB47-0097 could facilitate monosaccharide uptake. Oligosaccharide degradation can occur either via extracellular hydrolysis by a GH1 {beta}-glycosidase or by intracellular phosphorolysis using two GH94 enzymes. When C. obsidiansis was grown on switchgrass, the abundance of hemicellulases (including GH3, GH5, GH51, and GH67 enzymes) and certain sugar transporters increased significantly. Cultivation on biomass also caused a concerted increase in cytosolic enzymes for xylose and arabinose fermentation.« less
  • To design a robust quantitative proteomics study, an understanding of both the inherent heterogeneity of the biological samples being studied as well as the technical variability of the proteomics methods and platform is needed. Additionally, accurately identifying the technical steps associated with the largest variability would provide valuable information for the improvement and design of future processing pipelines. We present an experimental strategy that allows for a detailed examination of the variability of the quantitative LC-MS proteomics measurements. By replicating analyses at different stages of processing, various technical components can be estimated and their individual contribution to technical variability canmore » be dissected. This design can be easily adapted to other quantitative proteomics pipelines. Herein, we applied this methodology to our label-free workflow for the processing of human brain tissue. For this application, the pipeline was divided into four critical components: Tissue dissection and homogenization (extraction), protein denaturation followed by trypsin digestion and SPE clean-up (digestion), short-term run-to-run instrumental response fluctuation (instrumental variance), and long-term drift of the quantitative response of the LC-MS/MS platform over the 2 week period of continuous analysis (instrumental stability). From this analysis, we found the following contributions to variability: extraction (72%) >> instrumental variance (16%) > instrumental stability (8.4%) > digestion (3.1%). Furthermore, the stability of the platform and its’ suitability for discovery proteomics studies is demonstrated.« less
  • No abstract prepared.
  • Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomycescerevisiaeand prokaryotic Rhodopseudomonaspalustrislabel free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second,more » we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S.cerevisiaedata set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R.palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.« less