DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Curated BLAST for Genomes

Abstract

Curated BLAST for Genomes finds candidate genes for a process or an enzymatic activity within a genome of interest. In contrast to annotation tools, which usually predict a single activity for each protein, Curated BLAST asks if any of the proteins in the genome are similar to characterized proteins that are relevant. Given a query such as an enzyme’s name or an EC number, Curated BLAST searches the curated descriptions of over 100,000 characterized proteins, and it compares the relevant characterized proteins to the predicted proteins in the genome of interest. In case of errors in the gene models, Curated BLAST also searches the six-frame translation of the genome. Curated BLAST is available at http://papers.genomics.lbl.gov/curated. IMPORTANCE Given a microbe’s genome sequence, we often want to predict what capabilities the organism has, such as which nutrients it requires or which energy sources it can use. Or, we know the organism has a capability and we want to find the genes involved. Scientists often use automated gene annotations to find relevant genes, but automated annotations are often vague or incorrect. Curated BLAST finds candidate genes for a capability without relying on automated annotations. First, Curated BLAST finds proteins (usually from other organisms)more » whose functions have been studied experimentally and whose curated descriptions match a query. Then, it searches the genome of interest for similar proteins and returns a list of candidates. Curated BLAST is fast and often finds relevant genes that are missed by automated annotation.« less

Authors:
ORCiD logo [1];  [1];
  1. Lawrence Berkeley National Laboratory, Berkeley, California, USA
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER)
OSTI Identifier:
1504291
Alternate Identifier(s):
OSTI ID: 1508064; OSTI ID: 1777944
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Published Article
Journal Name:
mSystems
Additional Journal Information:
Journal Name: mSystems Journal Volume: 4 Journal Issue: 2; Journal ID: ISSN 2379-5077
Publisher:
American Society for Microbiology
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; annotation

Citation Formats

Price, Morgan N., Arkin, Adam P., and Greene, ed., Casey S. Curated BLAST for Genomes. United States: N. p., 2019. Web. doi:10.1128/mSystems.00072-19.
Price, Morgan N., Arkin, Adam P., & Greene, ed., Casey S. Curated BLAST for Genomes. United States. https://doi.org/10.1128/mSystems.00072-19
Price, Morgan N., Arkin, Adam P., and Greene, ed., Casey S. Tue . "Curated BLAST for Genomes". United States. https://doi.org/10.1128/mSystems.00072-19.
@article{osti_1504291,
title = {Curated BLAST for Genomes},
author = {Price, Morgan N. and Arkin, Adam P. and Greene, ed., Casey S.},
abstractNote = {Curated BLAST for Genomes finds candidate genes for a process or an enzymatic activity within a genome of interest. In contrast to annotation tools, which usually predict a single activity for each protein, Curated BLAST asks if any of the proteins in the genome are similar to characterized proteins that are relevant. Given a query such as an enzyme’s name or an EC number, Curated BLAST searches the curated descriptions of over 100,000 characterized proteins, and it compares the relevant characterized proteins to the predicted proteins in the genome of interest. In case of errors in the gene models, Curated BLAST also searches the six-frame translation of the genome. Curated BLAST is available at http://papers.genomics.lbl.gov/curated. IMPORTANCE Given a microbe’s genome sequence, we often want to predict what capabilities the organism has, such as which nutrients it requires or which energy sources it can use. Or, we know the organism has a capability and we want to find the genes involved. Scientists often use automated gene annotations to find relevant genes, but automated annotations are often vague or incorrect. Curated BLAST finds candidate genes for a capability without relying on automated annotations. First, Curated BLAST finds proteins (usually from other organisms) whose functions have been studied experimentally and whose curated descriptions match a query. Then, it searches the genome of interest for similar proteins and returns a list of candidates. Curated BLAST is fast and often finds relevant genes that are missed by automated annotation.},
doi = {10.1128/mSystems.00072-19},
journal = {mSystems},
number = 2,
volume = 4,
place = {United States},
year = {Tue Apr 30 00:00:00 EDT 2019},
month = {Tue Apr 30 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1128/mSystems.00072-19

Citation Metrics:
Cited by: 7 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

MicrobesOnline: an integrated portal for comparative and functional genomics
journal, November 2009

  • Dehal, P. S.; Joachimiak, M. P.; Price, M. N.
  • Nucleic Acids Research, Vol. 38, Issue suppl_1, p. D396-D400
  • DOI: 10.1093/nar/gkp919

Mutant phenotypes for thousands of bacterial genes of unknown function
journal, May 2018


REBASE—a database for DNA restriction and modification: enzymes, genes and genomes
journal, November 2014

  • Roberts, Richard J.; Vincze, Tamas; Posfai, Janos
  • Nucleic Acids Research, Vol. 43, Issue D1, p. D298-D299
  • DOI: 10.1093/nar/gku1046

EcoCyc: a comprehensive database resource for Escherichia coli
journal, December 2004

  • Keseler, Ingrid M.; Collado-Vides, Julio; Gama-Castro, Socorro
  • Nucleic Acids Research, Vol. 33, Issue suppl_1, p. D334-D337
  • DOI: 10.1093/nar/gki108

Update on RefSeq microbial genomes resources
journal, December 2014

  • Tatusova, Tatiana; Ciufo, Stacy; Federhen, Scott
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku1062

UniProt: the universal protein knowledgebase
journal, November 2016


CharProtDB: a database of experimentally characterized protein annotations
journal, December 2011

  • Madupu, R.; Richter, A.; Dodson, R. J.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1133

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases
journal, October 2009

  • Caspi, Ron; Altman, Tomer; Dale, Joseph M.
  • Nucleic Acids Research, Vol. 38, Issue suppl_1
  • DOI: 10.1093/nar/gkp875

BRENDA in 2017: new perspectives and new tools in BRENDA
journal, October 2016

  • Placzek, Sandra; Schomburg, Ida; Chang, Antje
  • Nucleic Acids Research, Vol. 45, Issue D1
  • DOI: 10.1093/nar/gkw952

IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes
journal, October 2018

  • Chen, I-Min A.; Chu, Ken; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky901

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)
journal, November 2013

  • Overbeek, Ross; Olson, Robert; Pusch, Gordon D.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1226

dbCAN: a web resource for automated carbohydrate-active enzyme annotation
journal, May 2012

  • Yin, Yanbin; Mao, Xizeng; Yang, Jincai
  • Nucleic Acids Research, Vol. 40, Issue W1
  • DOI: 10.1093/nar/gks479

The carbohydrate-active enzymes database (CAZy) in 2013
journal, November 2013

  • Lombard, Vincent; Golaconda Ramulu, Hemalatha; Drula, Elodie
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1178

Database resources of the National Center for Biotechnology Information
journal, November 2018

  • Sayers, Eric W.; Agarwala, Richa; Bolton, Evan E.
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1069

Search and clustering orders of magnitude faster than BLAST
journal, August 2010


How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?
journal, October 2003


Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
journal, November 2011


KEGG as a reference resource for gene and protein annotation
journal, October 2015

  • Kanehisa, Minoru; Sato, Yoko; Kawashima, Masayuki
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1070

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
journal, December 2009


Filling gaps in bacterial amino acid biosynthesis pathways with high-throughput genetics
journal, January 2018