skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis

Abstract

The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental datamore » indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less

Authors:
 [1];  [2];  [2];  [3];  [1]
  1. University of California, Berkeley
  2. ORNL
  3. {Bob} L [ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
932169
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Journal Article
Resource Relation:
Journal Name: Journal of Proteome Research; Journal Volume: 6; Journal Issue: 8
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; AMINO ACIDS; DISTRIBUTION; MUTATIONS; PROTEINS; SIMULATION; SURGES; Keywords: proteomics strain variation community genomics metagenomics liquid chromatography mass

Citation Formats

Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, and Banfield, Jillian F. Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis. United States: N. p., 2007. Web. doi:10.1021/pr0701005.
Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, & Banfield, Jillian F. Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis. United States. doi:10.1021/pr0701005.
Denef, Vincent, Shah, Manesh B, Verberkmoes, Nathan C, Hettich, Robert, and Banfield, Jillian F. Mon . "Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis". United States. doi:10.1021/pr0701005.
@article{osti_932169,
title = {Implications of Strain- and Species-Level Sequence Divergence for Community and Isolate Shotgun Proteomic Analysis},
author = {Denef, Vincent and Shah, Manesh B and Verberkmoes, Nathan C and Hettich, Robert and Banfield, Jillian F.},
abstractNote = {The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effects of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.},
doi = {10.1021/pr0701005},
journal = {Journal of Proteome Research},
number = 8,
volume = 6,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}
  • Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verificationmore » of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.« less
  • Background: Butanol is a second generation biofuel produced by Clostridium acetobutylicum through acetonebutanol- ethanol (ABE) fermentation process. Shotgun proteomics provides a direct approach to study the whole proteome of an organism in depth. This paper focuses on shotgun proteomic profiling of C. acetobutylicum from ABE fermentation using glucose and xylose to understand the functional mechanisms of C. acetobutylicum proteins involved in butanol production. Results: We identified 894 different proteins in C. acetobutylicum from ABE fermentation process by two dimensional - liquid chromatography - tandem mass spectrometry (2D-LC-MS/MS) method. This includes 717 proteins from glucose and 826 proteins from the xylosemore » substrate. A total of 649 proteins were found to be common and 22 significantly differentially expressed proteins were identified between glucose and xylose substrates. Conclusion: Our results demonstrate that flagellar proteins are highly up-regulated with glucose compared to xylose substrate during ABE fermentation. Chemotactic activity was also found to be lost with the xylose substrate due to the absence of CheW and CheV proteins. This is the first report on the shotgun proteomic analysis of C. acetobutylicum ATCC 824 in ABE fermentation between glucose and xylose substrate from a single time data point and the number of proteins identified here is more than any other study performed on this organism up to this report.« less
  • P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternativelymore » available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).« less