Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger
Abstract
Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST)more »
- Authors:
-
- Univ. of Liverpool (United Kingdom); Univ. of Manchester (United Kingdom)
- Univ. of Manchester (United Kingdom)
- USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Univ. of Liverpool (United Kingdom)
- Publication Date:
- Research Org.:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1556877
- Report Number(s):
- PNNL-SA-65830
Journal ID: ISSN 1471-2164
- Grant/Contract Number:
- AC05-76RL01830
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Genomics
- Additional Journal Information:
- Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1471-2164
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; proteomics; annotation; Aspergillus niger; fungi; ascomycete
Citation Formats
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States: N. p., 2009.
Web. doi:10.1186/1471-2164-10-61.
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., & Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States. https://doi.org/10.1186/1471-2164-10-61
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. Wed .
"Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger". United States. https://doi.org/10.1186/1471-2164-10-61. https://www.osti.gov/servlets/purl/1556877.
@article{osti_1556877,
title = {Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger},
author = {Wright, James C. and Sugden, Deana and Francis-McIntyre, Sue and Riba-Garcia, Isabel and Gaskell, Simon J. and Grigoriev, Igor V. and Baker, Scott E. and Beynon, Robert J. and Hubbard, Simon J.},
abstractNote = {Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.},
doi = {10.1186/1471-2164-10-61},
journal = {BMC Genomics},
number = 1,
volume = 10,
place = {United States},
year = {2009},
month = {2}
}
Web of Science
Works referenced in this record:
Aspergillus niger genomics: Past, present and into the future
journal, January 2006
- Baker, Scott E.
- Medical Mycology, Vol. 44, Issue s1
The Ensembl Analysis Pipeline
journal, May 2004
- Potter, Simon C.; Clarke, Laura; Curwen, Val
- Genome Research, Vol. 14, Issue 5
Genomic microarrays in the spotlight
journal, February 2004
- Mantripragada, Kiran K.; Buckley, Patrick G.; Diaz de Ståhl, Teresita
- Trends in Genetics, Vol. 20, Issue 2
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008
- Käll, Lukas; Storey, John D.; MacCoss, Michael J.
- Journal of Proteome Research, Vol. 7, Issue 1
What to do with“one-hit wonders”?
journal, May 2004
- Veenstra, Timothy D.; Conrads, Thomas P.; Issaq, Haleem J.
- ELECTROPHORESIS, Vol. 25, Issue 9
Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996
- Shevchenko, Andrej; Wilm, Matthias; Vorm, Ole
- Analytical Chemistry, Vol. 68, Issue 5
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
Improving gene annotation using peptide mass spectrometry
journal, January 2007
- Tanner, S.; Shen, Z.; Ng, J.
- Genome Research, Vol. 17, Issue 2
Expression profiling using cDNA microarrays
journal, January 1999
- Duggan, David J.; Bittner, Michael; Chen, Yidong
- Nature Genetics, Vol. 21, Issue S1
GAPP: A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006
- Shadforth, Ian; Xu, Weibing; Crowther, Daniel
- Journal of Proteome Research, Vol. 5, Issue 10
Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008
- Baerenfaller, K.; Grossmann, J.; Grobei, M. A.
- Science, Vol. 320, Issue 5878
Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics
text, January 2008
- K., Baerenfaller,; J., Grossmann,; A., Grobei, M.
- American Association for the Advancement of Science (AAAS)
False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008
- Choi, Hyungwon; Nesvizhskii, Alexey I.
- Journal of Proteome Research, Vol. 7, Issue 1
From the genome sequence to the proteome and back: evaluation of E. coli genome annotation with a 2-D gel-based proteomics approach
text, January 2007
- I., Maillet,; P., Berndt,; C., Malo,
- Wiley-Blackwell
Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003
- Jaffe, Jacob D.; Berg, Howard C.; Church, George M.
- PROTEOMICS, Vol. 4, Issue 1
Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008
- Ansong, C.; Purvine, S. O.; Adkins, J. N.
- Briefings in Functional Genomics and Proteomics, Vol. 7, Issue 1
Ensembl 2007
journal, January 2007
- Hubbard, T. J. P.; Aken, B. L.; Beal, K.
- Nucleic Acids Research, Vol. 35, Issue Database
Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007
- Ahn, Natalie G.; Shabb, John B.; Old, William M.
- ACS Chemical Biology, Vol. 2, Issue 1
Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis
journal, February 2007
- Rison, Stuart C. G.; Mattow, Jens; Jungblut, Peter R.
- Microbiology, Vol. 153, Issue 2
Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007
- Pel, Herman J.; de Winde, Johannes H.; Archer, David B.
- Nature Biotechnology, Vol. 25, Issue 2, p. 221-231
Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006
- McCarthy, Fiona M.; Cooksey, Amanda M.; Wang, Nan
- PROTEOMICS, Vol. 6, Issue 9
Differential analysis for high density tiling microarray data
journal, September 2007
- Ghosh, Srinka; Hirsch, Heather A.; Sekinger, Edward A.
- BMC Bioinformatics, Vol. 8, Issue 1
Proteomics technology in systems biology
journal, January 2006
- Smith, Jeffrey C.; Figeys, Daniel
- Molecular BioSystems, Vol. 2, Issue 8
Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005
- Shadforth, Ian; Dunkley, Tom; Lilley, Kathryn
- Rapid Communications in Mass Spectrometry, Vol. 19, Issue 22
The peptide atlas project
text, January 2006
- Frank, Desiere,; W., Deutsch, Eric; L., King, Nichole
- ETH Zurich
Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008
- Searle, Brian C.; Turner, Mark; Nesvizhskii, Alexey I.
- Journal of Proteome Research, Vol. 7, Issue 1
Expanding the organismal scope of proteomics: Cross-species protein identification by mass spectrometry and its implications
journal, January 2003
- Liska, Adam J.; Shevchenko, Andrej
- PROTEOMICS, Vol. 3, Issue 1
The Ensembl Analysis Pipeline
journal, May 2004
- Potter, Simon C.; Clarke, Laura; Curwen, Val
- Genome Research, Vol. 14, Issue 5
Aspergillus niger genomics: Past, present and into the future
journal, January 2006
- Baker, Scott E.
- Medical Mycology, Vol. 44, Issue s1
Proteomics of filamentous fungi
journal, September 2007
- Kim, Yonghyun; Nandakumar, M. P.; Marten, Mark R.
- Trends in Biotechnology, Vol. 25, Issue 9
Expression profiling using cDNA microarrays
journal, January 1999
- Duggan, David J.; Bittner, Michael; Chen, Yidong
- Nature Genetics, Vol. 21, Issue S1
Genomic microarrays in the spotlight
journal, February 2004
- Mantripragada, Kiran K.; Buckley, Patrick G.; Diaz de Ståhl, Teresita
- Trends in Genetics, Vol. 20, Issue 2
Differential analysis for high density tiling microarray data
journal, September 2007
- Ghosh, Srinka; Hirsch, Heather A.; Sekinger, Edward A.
- BMC Bioinformatics, Vol. 8, Issue 1
Improving gene annotation using peptide mass spectrometry
journal, January 2007
- Tanner, S.; Shen, Z.; Ng, J.
- Genome Research, Vol. 17, Issue 2
Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003
- Jaffe, Jacob D.; Berg, Howard C.; Church, George M.
- PROTEOMICS, Vol. 4, Issue 1
Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008
- Ansong, C.; Purvine, S. O.; Adkins, J. N.
- Briefings in Functional Genomics and Proteomics, Vol. 7, Issue 1
Multidimensional protein identification technology: current status and future prospects
journal, January 2005
- Kislinger, Thomas; Emili, Andrew
- Expert Review of Proteomics, Vol. 2, Issue 1
Proteomics technology in systems biology
journal, January 2006
- Smith, Jeffrey C.; Figeys, Daniel
- Molecular BioSystems, Vol. 2, Issue 8
The PeptideAtlas project
journal, January 2006
- Desiere, F.
- Nucleic Acids Research, Vol. 34, Issue 90001
cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome
journal, July 2007
- Lu, Fangli; Jiang, Hongying; Ding, Jinhui
- BMC Genomics, Vol. 8, Issue 1
Mass spectrometry allows direct identification of proteins in large genomes
journal, April 2001
- Küster, Bernhard; Mortensen, Peter; Andersen, Jens S.
- PROTEOMICS, Vol. 1, Issue 5
Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006
- McCarthy, Fiona M.; Cooksey, Amanda M.; Wang, Nan
- PROTEOMICS, Vol. 6, Issue 9
Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
journal, January 2006
- Fermin, Damian; Allen, Baxter B.; Blackwell, Thomas W.
- Genome Biology, Vol. 7, Issue 4, p. R35
Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis
journal, February 2007
- Rison, Stuart C. G.; Mattow, Jens; Jungblut, Peter R.
- Microbiology, Vol. 153, Issue 2
GAPP: A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006
- Shadforth, Ian; Xu, Weibing; Crowther, Daniel
- Journal of Proteome Research, Vol. 5, Issue 10
Ensembl 2007
journal, January 2007
- Hubbard, T. J. P.; Aken, B. L.; Beal, K.
- Nucleic Acids Research, Vol. 35, Issue Database
From the genome sequence to the proteome and back: Evaluation ofE. coli genome annotation with a 2-D gel-based proteomics approach
journal, April 2007
- Maillet, Isabelle; Berndt, Peter; Malo, Céline
- PROTEOMICS, Vol. 7, Issue 7
Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007
- Pel, Herman J.; de Winde, Johannes H.; Archer, David B.
- Nature Biotechnology, Vol. 25, Issue 2, p. 221-231
Probability-based protein identification by searching sequence databases using mass spectrometry data
journal, December 1999
- Perkins, David N.; Pappin, Darryl J. C.; Creasy, David M.
- Electrophoresis, Vol. 20, Issue 18
Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005
- Shadforth, Ian; Dunkley, Tom; Lilley, Kathryn
- Rapid Communications in Mass Spectrometry, Vol. 19, Issue 22
Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996
- Shevchenko, Andrej; Wilm, Matthias; Vorm, Ole
- Analytical Chemistry, Vol. 68, Issue 5
Ab initio Gene Finding in Drosophila Genomic DNA
journal, April 2000
- Salamov, Asaf A.; Solovyev, Victor V.
- Genome Research, Vol. 10, Issue 4, p. 516-522
Using GeneWise in the Drosophila Annotation Experiment
journal, April 2000
- Birney, E.
- Genome Research, Vol. 10, Issue 4
False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008
- Choi, Hyungwon; Nesvizhskii, Alexey I.
- Journal of Proteome Research, Vol. 7, Issue 1
The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002
- Stajich, J. E.
- Genome Research, Vol. 12, Issue 10
What to do with“one-hit wonders”?
journal, May 2004
- Veenstra, Timothy D.; Conrads, Thomas P.; Issaq, Haleem J.
- ELECTROPHORESIS, Vol. 25, Issue 9
Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007
- Ahn, Natalie G.; Shabb, John B.; Old, William M.
- ACS Chemical Biology, Vol. 2, Issue 1
Positional proteomics: preparation of amino-terminal peptides as a strategy for proteome simplification and characterization
journal, November 2006
- McDonald, Lucy; Beynon, Robert J.
- Nature Protocols, Vol. 1, Issue 4
Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008
- Baerenfaller, K.; Grossmann, J.; Grobei, M. A.
- Science, Vol. 320, Issue 5878
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008
- Käll, Lukas; Storey, John D.; MacCoss, Michael J.
- Journal of Proteome Research, Vol. 7, Issue 1
Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008
- Searle, Brian C.; Turner, Mark; Nesvizhskii, Alexey I.
- Journal of Proteome Research, Vol. 7, Issue 1
Works referencing / citing this record:
OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics
journal, April 2011
- Helmy, Mohamed; Tomita, Masaru; Ishihama, Yasushi
- BMC Plant Biology, Vol. 11, Issue 1
Genome annotation of a Saccharomyces sp. lager brewer's yeast
journal, September 2016
- De León-Medina, Patricia Marcela; Elizondo-González, Ramiro; Damas-Buenrostro, Luis Cástulo
- Genomics Data, Vol. 9
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium
journal, August 2011
- Ansong, Charles; Tolić, Nikola; Purvine, Samuel O.
- BMC Genomics, Vol. 12, Issue 1
Expression and export: recombinant protein production systems for Aspergillus
journal, June 2010
- Fleißner, André; Dersch, Petra
- Applied Microbiology and Biotechnology, Vol. 87, Issue 4
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum
journal, September 2009
- Bringans, Scott; Hane, James K.; Casey, Tammy
- BMC Bioinformatics, Vol. 10, Issue 1
AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions
journal, September 2019
- Korandla, Deepank R.; Wozniak, Jacob M.; Campeau, Anaamika
- Bioinformatics
Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons
journal, October 2009
- Baudet, Mathieu; Ortet, Philippe; Gaillard, Jean-Charles
- Molecular & Cellular Proteomics, Vol. 9, Issue 2
Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry
journal, February 2014
- Nagarajha Selvan, Lakshmi Dhevi; Kaviyil, Jyothi Embekkat; Nirujogi, Raja Sekhar
- Clinical Proteomics, Vol. 11, Issue 1