skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis

Abstract

The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental datamore » indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less

Authors:
 [1];  [2];  [2];  [3];  [1]
  1. University of California, Berkeley
  2. ORNL
  3. {Bob} L [ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
932169
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Journal Article
Resource Relation:
Journal Name: Journal of Proteome Research; Journal Volume: 6; Journal Issue: 8
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; AMINO ACIDS; DISTRIBUTION; MUTATIONS; PROTEINS; SIMULATION; SURGES; Keywords: proteomics strain variation community genomics metagenomics liquid chromatography mass

Citation Formats

Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, and Banfield, Jillian F. Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis. United States: N. p., 2007. Web. doi:10.1021/pr0701005.
Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, & Banfield, Jillian F. Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis. United States. doi:10.1021/pr0701005.
Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, and Banfield, Jillian F. Mon . "Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis". United States. doi:10.1021/pr0701005.
@article{osti_932169,
title = {Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis},
author = {Denef, Vincent and Shah, Manesh B and Verberkmoes, Nathan C and Hettich, Robert and Banfield, Jillian F.},
abstractNote = {The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.},
doi = {10.1021/pr0701005},
journal = {Journal of Proteome Research},
number = 8,
volume = 6,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}