DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

Abstract

Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST)more » data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.« less

Authors:
 [1];  [2];  [2];  [2];  [2];  [3]; ORCiD logo [4];  [5];  [2]
  1. Univ. of Liverpool (United Kingdom); Univ. of Manchester (United Kingdom)
  2. Univ. of Manchester (United Kingdom)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  5. Univ. of Liverpool (United Kingdom)
Publication Date:
Research Org.:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1556877
Report Number(s):
PNNL-SA-65830
Journal ID: ISSN 1471-2164
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Accepted Manuscript
Journal Name:
BMC Genomics
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1471-2164
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; proteomics; annotation; Aspergillus niger; fungi; ascomycete

Citation Formats

Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States: N. p., 2009. Web. doi:10.1186/1471-2164-10-61.
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., & Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States. https://doi.org/10.1186/1471-2164-10-61
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. Wed . "Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger". United States. https://doi.org/10.1186/1471-2164-10-61. https://www.osti.gov/servlets/purl/1556877.
@article{osti_1556877,
title = {Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger},
author = {Wright, James C. and Sugden, Deana and Francis-McIntyre, Sue and Riba-Garcia, Isabel and Gaskell, Simon J. and Grigoriev, Igor V. and Baker, Scott E. and Beynon, Robert J. and Hubbard, Simon J.},
abstractNote = {Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.},
doi = {10.1186/1471-2164-10-61},
journal = {BMC Genomics},
number = 1,
volume = 10,
place = {United States},
year = {2009},
month = {2}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 29 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Aspergillus niger genomics: Past, present and into the future
journal, January 2006


The Ensembl Analysis Pipeline
journal, May 2004

  • Potter, Simon C.; Clarke, Laura; Curwen, Val
  • Genome Research, Vol. 14, Issue 5
  • DOI: 10.1101/gr.1859804

Genomic microarrays in the spotlight
journal, February 2004

  • Mantripragada, Kiran K.; Buckley, Patrick G.; Diaz de Ståhl, Teresita
  • Trends in Genetics, Vol. 20, Issue 2
  • DOI: 10.1016/j.tig.2003.12.008

Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700739d

What to do with“one-hit wonders”?
journal, May 2004

  • Veenstra, Timothy D.; Conrads, Thomas P.; Issaq, Haleem J.
  • ELECTROPHORESIS, Vol. 25, Issue 9
  • DOI: 10.1002/elps.200490007

Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996

  • Shevchenko, Andrej; Wilm, Matthias; Vorm, Ole
  • Analytical Chemistry, Vol. 68, Issue 5
  • DOI: 10.1021/ac950914h

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


Improving gene annotation using peptide mass spectrometry
journal, January 2007


Expression profiling using cDNA microarrays
journal, January 1999

  • Duggan, David J.; Bittner, Michael; Chen, Yidong
  • Nature Genetics, Vol. 21, Issue S1
  • DOI: 10.1038/4434

GAPP:  A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006

  • Shadforth, Ian; Xu, Weibing; Crowther, Daniel
  • Journal of Proteome Research, Vol. 5, Issue 10
  • DOI: 10.1021/pr060205s

Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008


Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics
text, January 2008

  • K., Baerenfaller,; J., Grossmann,; A., Grobei, M.
  • American Association for the Advancement of Science (AAAS)
  • DOI: 10.5167/uzh-11009

False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008

  • Choi, Hyungwon; Nesvizhskii, Alexey I.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700747q

Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003


Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008

  • Ansong, C.; Purvine, S. O.; Adkins, J. N.
  • Briefings in Functional Genomics and Proteomics, Vol. 7, Issue 1
  • DOI: 10.1093/bfgp/eln010

Ensembl 2007
journal, January 2007

  • Hubbard, T. J. P.; Aken, B. L.; Beal, K.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl996

Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007

  • Ahn, Natalie G.; Shabb, John B.; Old, William M.
  • ACS Chemical Biology, Vol. 2, Issue 1
  • DOI: 10.1021/cb600357d

Mass Spectrometry and Protein Analysis
journal, April 2006


Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007

  • Pel, Herman J.; de Winde, Johannes H.; Archer, David B.
  • Nature Biotechnology, Vol. 25, Issue 2, p. 221-231
  • DOI: 10.1038/nbt1282

Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006


Differential analysis for high density tiling microarray data
journal, September 2007

  • Ghosh, Srinka; Hirsch, Heather A.; Sekinger, Edward A.
  • BMC Bioinformatics, Vol. 8, Issue 1
  • DOI: 10.1186/1471-2105-8-359

Proteomics technology in systems biology
journal, January 2006

  • Smith, Jeffrey C.; Figeys, Daniel
  • Molecular BioSystems, Vol. 2, Issue 8
  • DOI: 10.1039/b606798k

Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005

  • Shadforth, Ian; Dunkley, Tom; Lilley, Kathryn
  • Rapid Communications in Mass Spectrometry, Vol. 19, Issue 22
  • DOI: 10.1002/rcm.2203

The peptide atlas project
text, January 2006


Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008

  • Searle, Brian C.; Turner, Mark; Nesvizhskii, Alexey I.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr070540w

The Ensembl Analysis Pipeline
journal, May 2004

  • Potter, Simon C.; Clarke, Laura; Curwen, Val
  • Genome Research, Vol. 14, Issue 5
  • DOI: 10.1101/gr.1859804

Aspergillus niger genomics: Past, present and into the future
journal, January 2006


Proteomics of filamentous fungi
journal, September 2007


Expression profiling using cDNA microarrays
journal, January 1999

  • Duggan, David J.; Bittner, Michael; Chen, Yidong
  • Nature Genetics, Vol. 21, Issue S1
  • DOI: 10.1038/4434

Genomic microarrays in the spotlight
journal, February 2004

  • Mantripragada, Kiran K.; Buckley, Patrick G.; Diaz de Ståhl, Teresita
  • Trends in Genetics, Vol. 20, Issue 2
  • DOI: 10.1016/j.tig.2003.12.008

Differential analysis for high density tiling microarray data
journal, September 2007

  • Ghosh, Srinka; Hirsch, Heather A.; Sekinger, Edward A.
  • BMC Bioinformatics, Vol. 8, Issue 1
  • DOI: 10.1186/1471-2105-8-359

Improving gene annotation using peptide mass spectrometry
journal, January 2007


Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003


Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008

  • Ansong, C.; Purvine, S. O.; Adkins, J. N.
  • Briefings in Functional Genomics and Proteomics, Vol. 7, Issue 1
  • DOI: 10.1093/bfgp/eln010

Mass Spectrometry and Protein Analysis
journal, April 2006


Multidimensional protein identification technology: current status and future prospects
journal, January 2005


Proteomics technology in systems biology
journal, January 2006

  • Smith, Jeffrey C.; Figeys, Daniel
  • Molecular BioSystems, Vol. 2, Issue 8
  • DOI: 10.1039/b606798k

The PeptideAtlas project
journal, January 2006


cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome
journal, July 2007


Mass spectrometry allows direct identification of proteins in large genomes
journal, April 2001


Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006


Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
journal, January 2006

  • Fermin, Damian; Allen, Baxter B.; Blackwell, Thomas W.
  • Genome Biology, Vol. 7, Issue 4, p. R35
  • DOI: 10.1186/gb-2006-7-4-r35

GAPP:  A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006

  • Shadforth, Ian; Xu, Weibing; Crowther, Daniel
  • Journal of Proteome Research, Vol. 5, Issue 10
  • DOI: 10.1021/pr060205s

Ensembl 2007
journal, January 2007

  • Hubbard, T. J. P.; Aken, B. L.; Beal, K.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl996

From the genome sequence to the proteome and back: Evaluation ofE. coli genome annotation with a 2-D gel-based proteomics approach
journal, April 2007


Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007

  • Pel, Herman J.; de Winde, Johannes H.; Archer, David B.
  • Nature Biotechnology, Vol. 25, Issue 2, p. 221-231
  • DOI: 10.1038/nbt1282

Probability-based protein identification by searching sequence databases using mass spectrometry data
journal, December 1999


Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005

  • Shadforth, Ian; Dunkley, Tom; Lilley, Kathryn
  • Rapid Communications in Mass Spectrometry, Vol. 19, Issue 22
  • DOI: 10.1002/rcm.2203

Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996

  • Shevchenko, Andrej; Wilm, Matthias; Vorm, Ole
  • Analytical Chemistry, Vol. 68, Issue 5
  • DOI: 10.1021/ac950914h

Ab initio Gene Finding in Drosophila Genomic DNA
journal, April 2000

  • Salamov, Asaf A.; Solovyev, Victor V.
  • Genome Research, Vol. 10, Issue 4, p. 516-522
  • DOI: 10.1101/gr.10.4.516

Using GeneWise in the Drosophila Annotation Experiment
journal, April 2000


False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008

  • Choi, Hyungwon; Nesvizhskii, Alexey I.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700747q

The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


What to do with“one-hit wonders”?
journal, May 2004

  • Veenstra, Timothy D.; Conrads, Thomas P.; Issaq, Haleem J.
  • ELECTROPHORESIS, Vol. 25, Issue 9
  • DOI: 10.1002/elps.200490007

Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007

  • Ahn, Natalie G.; Shabb, John B.; Old, William M.
  • ACS Chemical Biology, Vol. 2, Issue 1
  • DOI: 10.1021/cb600357d

Positional proteomics: preparation of amino-terminal peptides as a strategy for proteome simplification and characterization
journal, November 2006


Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008


Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008

  • Käll, Lukas; Storey, John D.; MacCoss, Michael J.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr700739d

Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008

  • Searle, Brian C.; Turner, Mark; Nesvizhskii, Alexey I.
  • Journal of Proteome Research, Vol. 7, Issue 1
  • DOI: 10.1021/pr070540w

Works referencing / citing this record:

OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics
journal, April 2011

  • Helmy, Mohamed; Tomita, Masaru; Ishihama, Yasushi
  • BMC Plant Biology, Vol. 11, Issue 1
  • DOI: 10.1186/1471-2229-11-63

Genome annotation of a Saccharomyces sp. lager brewer's yeast
journal, September 2016

  • De León-Medina, Patricia Marcela; Elizondo-González, Ramiro; Damas-Buenrostro, Luis Cástulo
  • Genomics Data, Vol. 9
  • DOI: 10.1016/j.gdata.2016.05.009

Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium
journal, August 2011


Expression and export: recombinant protein production systems for Aspergillus
journal, June 2010


AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions
journal, September 2019


Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons
journal, October 2009

  • Baudet, Mathieu; Ortet, Philippe; Gaillard, Jean-Charles
  • Molecular & Cellular Proteomics, Vol. 9, Issue 2
  • DOI: 10.1074/mcp.m900359-mcp200

Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry
journal, February 2014

  • Nagarajha Selvan, Lakshmi Dhevi; Kaviyil, Jyothi Embekkat; Nirujogi, Raja Sekhar
  • Clinical Proteomics, Vol. 11, Issue 1
  • DOI: 10.1186/1559-0275-11-5