skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting proteomic data for genome annotation and gene model validation in Aspergillus niger

Journal Article · · BMC Genomics
 [1];  [2];  [2];  [2];  [2];  [3]; ORCiD logo [4];  [5];  [2]
  1. Univ. of Liverpool (United Kingdom); Univ. of Manchester (United Kingdom)
  2. Univ. of Manchester (United Kingdom)
  3. USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)
  4. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
  5. Univ. of Liverpool (United Kingdom)

Proteomic data is a potentially rich, but arguably unexploited, data source for genome annotation. Peptide identifications from tandem mass spectrometry provide prima facie evidence for gene predictions and can discriminate over a set of candidate gene models. Here we apply this to the recently sequenced Aspergillus niger fungal genome from the Joint Genome Institutes (JGI) and another predicted protein set from another A.niger sequence. Tandem mass spectra (MS/MS) were acquired from 1d gel electrophoresis bands and searched against all available gene models using Average Peptide Scoring (APS) and reverse database searching to produce confident identifications at an acceptable false discovery rate (FDR).405 identified peptide sequences were mapped to 214 different A.niger genomic loci to which 4093 predicted gene models clustered, 2872 of which contained the mapped peptides. Interestingly, 13 (6%) of these loci either had no preferred predicted gene model or the genome annotators' chosen "best" model for that genomic locus was not found to be the most parsimonious match to the identified peptides. The peptides identified also boosted confidence in predicted gene structures spanning 54 introns from different gene models.This work highlights the potential of integrating experimental proteomics data into genomic annotation pipelines much as expressed sequence tag (EST) data has been. A comparison of the published genome from another strain of A.niger sequenced by DSM showed that a number of the gene models or proteins with proteomics evidence did not occur in both genomes, further highlighting the utility of the method.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1556877
Report Number(s):
PNNL-SA-65830
Journal Information:
BMC Genomics, Vol. 10, Issue 1; ISSN 1471-2164
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 29 works
Citation information provided by
Web of Science

References (40)

Aspergillus niger genomics: Past, present and into the future journal January 2006
The Ensembl Analysis Pipeline journal May 2004
Genomic microarrays in the spotlight journal February 2004
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin journal January 2008
What to do with“one-hit wonders”? journal May 2004
Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels journal January 1996
Automated generation of heuristics for biological sequence comparison journal February 2005
The Bioperl Toolkit: Perl Modules for the Life Sciences journal October 2002
Improving gene annotation using peptide mass spectrometry journal January 2007
Expression profiling using cDNA microarrays journal January 1999
GAPP:  A Fully Automated Software for the Confident Identification of Human Peptides from Tandem Mass Spectra journal August 2006
Genome-Scale Proteomics Reveals Arabidopsis thaliana Gene Models and Proteome Dynamics journal May 2008
Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics text January 2008
False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics journal January 2008
From the genome sequence to the proteome and back: evaluation of E. coli genome annotation with a 2-D gel-based proteomics approach text January 2007
Proteogenomic mapping as a complementary method to perform genome annotation journal July 2003
Proteogenomics: needs and roles to be filled by proteomics in genome annotation journal March 2008
Ensembl 2007 journal January 2007
Achieving In-Depth Proteomics Profiling by Mass Spectrometry journal January 2007
Mass Spectrometry and Protein Analysis journal April 2006
Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis journal February 2007
Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88 journal January 2007
Modeling a whole organ using proteomics: The avian bursa of Fabricius journal May 2006
Differential analysis for high density tiling microarray data journal September 2007
Proteomics technology in systems biology journal January 2006
Confident protein identification using the average peptide score method coupled with search-specific,ab initio thresholds journal January 2005
The peptide atlas project text January 2006
Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies journal January 2008
Expanding the organismal scope of proteomics: Cross-species protein identification by mass spectrometry and its implications journal January 2003
Proteomics of filamentous fungi journal September 2007
Multidimensional protein identification technology: current status and future prospects journal January 2005
The PeptideAtlas project journal January 2006
cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome journal July 2007
Mass spectrometry allows direct identification of proteins in large genomes journal April 2001
Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics journal January 2006
From the genome sequence to the proteome and back: Evaluation ofE. coli genome annotation with a 2-D gel-based proteomics approach journal April 2007
Probability-based protein identification by searching sequence databases using mass spectrometry data journal December 1999
Ab initio Gene Finding in Drosophila Genomic DNA journal April 2000
Using GeneWise in the Drosophila Annotation Experiment journal April 2000
Positional proteomics: preparation of amino-terminal peptides as a strategy for proteome simplification and characterization journal November 2006

Cited By (8)

OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics journal April 2011
Genome annotation of a Saccharomyces sp. lager brewer's yeast journal September 2016
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium journal August 2011
Expression and export: recombinant protein production systems for Aspergillus journal June 2010
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum journal September 2009
AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions journal September 2019
Proteomics-based Refinement of Deinococcus deserti Genome Annotation Reveals an Unwonted Use of Non-canonical Translation Initiation Codons journal October 2009
Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry journal February 2014