Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

In search of genome annotation consistency: solid gene clusters and how to use them

Journal Article · · 3 Biotech
 [1];  [2];  [3];  [4];  [5]
  1. Univ. of Illinois at Urbana-Champaign, IL (United States). Inst. for Genomic Biology; OSTI
  2. Univ. of Illinois at Urbana-Champaign, IL (United States). Inst. for Genomic Biology; Univ. of Illinois at Urbana-Champaign, IL (United States). Dept. of Microbiology
  3. Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States); Argonne National Lab. (ANL), Argonne, IL (United States). Mathematics and Computer Science
  4. Fellowship for Interpretation of Genomes, Burr Ridge, IL (United States)
  5. Argonne National Lab. (ANL), Argonne, IL (United States). Mathematics and Computer Science
Maintaining consistency in genome annotations is important for supporting many computational tasks, particularly metabolic modeling. The SEED project has implemented a process that improves annotation consistencies across microbial genomes for proteins with conserved sequences and genomic context. In this research report, we describe this process and show how this effort has resulted in improvements to microbial genome annotations in the SEED. We also compare SEED annotation consistencies with other commonly used resources such as IMG (the Joint Genome Institute’s Integrated Microbial Genomes system), RefSeq (the National Center for Biotechnology Information’s Reference Sequence Database), Swiss-Prot (the annotated protein sequence database of the Swiss Institute of Bioinformatics, European Molecular Biology Laboratory and the European Bioinformatics Institute) and TrEMBL (Translated European Molecular Biology Laboratory nucleotide sequence data Library). Our analysis indicates that manual and computational efforts are paying off for the databases where consistency is a major goal.
Research Organization:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Organization:
National Institutes of Health (NIH); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1815515
Journal Information:
3 Biotech, Journal Name: 3 Biotech Journal Issue: 3 Vol. 4; ISSN 2190-572X
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (19)

High-throughput generation, optimization and analysis of genome-scale metabolic models journal August 2010
Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal November 2021
PICRUSt2 for prediction of metagenome functions journal June 2020
Comparative in-silico proteomic analysis discerns potential granuloma proteins of Yersinia pseudotuberculosis journal February 2020
The use of gene clusters to infer functional coupling journal March 1999
High-quality protein knowledge resource: SWISS-PROT and TrEMBL journal January 2002
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal December 2004
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes journal September 2005
HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot journal January 2009
IMG: the integrated microbial genomes database and comparative analysis system journal December 2011
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness journal March 2005
Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process journal March 2000
Gene Ontology: tool for the unification of biology journal May 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal January 2007
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness journal March 2005
Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process journal March 2000
The COG database: an updated version includes eukaryotes journal January 2003
Improving Microbial Genome Annotations in an Integrated Database Context journal February 2013

Cited By (3)

PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database journal February 2016
RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes journal February 2015
The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) journal November 2013

Similar Records

Sentra, a database of signal transduction proteins.
Journal Article · Mon Dec 31 23:00:00 EST 2001 · Nucleic Acids Res. · OSTI ID:949424

The DOE-JGI Standard Operating Procedure for the Annotations of the Microbial Genomes
Journal Article · Wed May 20 00:00:00 EDT 2009 · Standards in Genomic Sciences · OSTI ID:974530

The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
Journal Article · Tue Feb 23 19:00:00 EST 2016 · Standards in Genomic Sciences · OSTI ID:1618964