Proteogenomic analysis of bacteria and archaea: A 46 organism case study
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 51 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 701 novel proteins, 1365 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1095 signal peptides. Proteomics provides a powerful experimental data type to access and improve the quality of genome annotation. A key advantage is the direct correlation between protein annotation and a protein based assay. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
- Research Organization:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1033067
- Report Number(s):
- PNNL-SA-75723; KP1601010; TRN: US201202%%569
- Journal Information:
- PLoS One, Vol. 6, Issue 11
- Country of Publication:
- United States
- Language:
- English
Similar Records
Proteogenomic Analysis of Burkholderia Species Strains 25 and 46 Isolated from Uraniferous Soils Reveals Multiple Mechanisms to Cope with Uranium Stress
Genome-wide protein localization prediction strategies for gram negative bacteria