skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

Abstract

Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST)more » data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.« less

Authors:
 [1];  [2];  [2];  [2];  [2];  [3]; ORCiD logo [4];  [5];  [2]
  1. Univ. of Liverpool (United Kingdom); Univ. of Manchester (United Kingdom)
  2. Univ. of Manchester (United Kingdom)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  5. Univ. of Liverpool (United Kingdom)
Publication Date:
Research Org.:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1556877
Report Number(s):
PNNL-SA-65830
Journal ID: ISSN 1471-2164
Grant/Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
BMC Genomics
Additional Journal Information:
Journal Volume: 10; Journal Issue: 1; Journal ID: ISSN 1471-2164
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; proteomics; annotation; Aspergillus niger; fungi; ascomycete

Citation Formats

Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States: N. p., 2009. Web. doi:10.1186/1471-2164-10-61.
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., & Hubbard, Simon J. Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger. United States. https://doi.org/10.1186/1471-2164-10-61
Wright, James C., Sugden, Deana, Francis-McIntyre, Sue, Riba-Garcia, Isabel, Gaskell, Simon J., Grigoriev, Igor V., Baker, Scott E., Beynon, Robert J., and Hubbard, Simon J. 2009. "Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger". United States. https://doi.org/10.1186/1471-2164-10-61. https://www.osti.gov/servlets/purl/1556877.
@article{osti_1556877,
title = {Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger},
author = {Wright, James C. and Sugden, Deana and Francis-McIntyre, Sue and Riba-Garcia, Isabel and Gaskell, Simon J. and Grigoriev, Igor V. and Baker, Scott E. and Beynon, Robert J. and Hubbard, Simon J.},
abstractNote = {Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.},
doi = {10.1186/1471-2164-10-61},
url = {https://www.osti.gov/biblio/1556877}, journal = {BMC Genomics},
issn = {1471-2164},
number = 1,
volume = 10,
place = {United States},
year = {Wed Feb 04 00:00:00 EST 2009},
month = {Wed Feb 04 00:00:00 EST 2009}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 29 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Aspergillus niger genomics: Past, present and into the future
journal, January 2006


The Ensembl Analysis Pipeline
journal, May 2004


Genomic microarrays in the spotlight
journal, February 2004


Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008


What to do with“one-hit wonders”?
journal, May 2004


Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996


Automated generation of heuristics for biological sequence comparison
journal, February 2005


The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


Improving gene annotation using peptide mass spectrometry
journal, January 2007


Expression profiling using cDNA microarrays
journal, January 1999


GAPP:  A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006


Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008


Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics
text, January 2008


False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008


Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003


Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008


Ensembl 2007
journal, January 2007


Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007


Mass Spectrometry and Protein Analysis
journal, April 2006


Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007


Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006


Differential analysis for high density tiling microarray data
journal, September 2007


Proteomics technology in systems biology
journal, January 2006


Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005


The peptide atlas project
text, January 2006


Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008


The Ensembl Analysis Pipeline
journal, May 2004


Aspergillus niger genomics: Past, present and into the future
journal, January 2006


Proteomics of filamentous fungi
journal, September 2007


Expression profiling using cDNA microarrays
journal, January 1999


Genomic microarrays in the spotlight
journal, February 2004


Differential analysis for high density tiling microarray data
journal, September 2007


Improving gene annotation using peptide mass spectrometry
journal, January 2007


Proteogenomic mapping as a complementary method to perform genome annotation
journal, July 2003


Proteogenomics: needs and roles to be filled by proteomics in genome annotation
journal, March 2008


Mass Spectrometry and Protein Analysis
journal, April 2006


Multidimensional protein identification technology: current status and future prospects
journal, January 2005


Proteomics technology in systems biology
journal, January 2006


The PeptideAtlas project
journal, January 2006


cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome
journal, July 2007


Mass spectrometry allows direct identification of proteins in large genomes
journal, April 2001


Modeling a whole organ using proteomics: The avian bursa of Fabricius
journal, May 2006


Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
journal, January 2006


GAPP:  A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra
journal, August 2006


Ensembl 2007
journal, January 2007


Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88
journal, January 2007


Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds
journal, January 2005


Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
journal, January 1996


Ab initio Gene Finding in Drosophila Genomic DNA
journal, April 2000


Using GeneWise in the Drosophila Annotation Experiment
journal, April 2000


False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
journal, January 2008


Automated generation of heuristics for biological sequence comparison
journal, February 2005


The Bioperl Toolkit: Perl Modules for the Life Sciences
journal, October 2002


What to do with“one-hit wonders”?
journal, May 2004


Achieving In-Depth Proteomics Profiling by Mass Spectrometry
journal, January 2007


Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics
journal, May 2008


Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
journal, January 2008


Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
journal, January 2008


Works referencing / citing this record:

OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics
journal, April 2011


Genome annotation of a Saccharomyces sp. lager brewer's yeast
journal, September 2016


Expression and export: recombinant protein production systems for Aspergillus
journal, June 2010


AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions
journal, September 2019


Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons
journal, October 2009


Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry
journal, February 2014