DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations

Abstract

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. As a result, in this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. In conclusion, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequence-based genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA.more » PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Security; Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1618521
Alternate Identifier(s):
OSTI ID: 1238774; OSTI ID: 1305875
Report Number(s):
LLNL-JRNL-664411
Journal ID: ISSN 1471-2105; 43; PII: 887
Grant/Contract Number:  
AC52-07NA27344; PE0603384BP-B0946791; SCW1039
Resource Type:
Published Article
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Name: BMC Bioinformatics Journal Volume: 17 Journal Issue: 1; Journal ID: ISSN 1471-2105
Publisher:
Springer Science + Business Media
Country of Publication:
United Kingdom
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; 97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Citation Formats

Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, and Zhou, Carol L. Ecale. Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations. United Kingdom: N. p., 2016. Web. doi:10.1186/s12859-016-0887-y.
Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, & Zhou, Carol L. Ecale. Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations. United Kingdom. https://doi.org/10.1186/s12859-016-0887-y
Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, and Zhou, Carol L. Ecale. Wed . "Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations". United Kingdom. https://doi.org/10.1186/s12859-016-0887-y.
@article{osti_1618521,
title = {Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations},
author = {Leung, Elo and Huang, Amy and Cadag, Eithon and Montana, Aldrin and Soliman, Jan Lorenz and Zhou, Carol L. Ecale},
abstractNote = {In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. As a result, in this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. In conclusion, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequence-based genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.},
doi = {10.1186/s12859-016-0887-y},
journal = {BMC Bioinformatics},
number = 1,
volume = 17,
place = {United Kingdom},
year = {Wed Jan 20 00:00:00 EST 2016},
month = {Wed Jan 20 00:00:00 EST 2016}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1186/s12859-016-0887-y

Citation Metrics:
Cited by: 6 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

InterProScan 5: genome-scale protein function classification
journal, January 2014


iGepros: an integrated gene and protein annotation server for biological nature exploration
journal, December 2011


Combination of degradation pathways for naphthalene utilization in R hodococcus sp. strain TFB : Naphthalene degradation in
journal, December 2013

  • Tomás-Gallardo, Laura; Gómez-Álvarez, Helena; Santero, Eduardo
  • Microbial Biotechnology, Vol. 7, Issue 2
  • DOI: 10.1111/1751-7915.12096

What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins
journal, April 2014


EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes
journal, August 2012


Unraveling the Complexities of Life Sciences Data
journal, March 2013

  • Higdon, Roger; Haynes, Winston; Stanberry, Larissa
  • Big Data, Vol. 1, Issue 1
  • DOI: 10.1089/big.2012.1505

The IGS Standard Operating Procedure for Automated Prokaryotic Annotation
journal, April 2011

  • Galens, Kevin; Orvis, Joshua; Daugherty, Sean
  • Standards in Genomic Sciences, Vol. 4, Issue 2
  • DOI: 10.4056/sigs.1223234

MvirDB--a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications
journal, January 2007

  • Zhou, C. E.; Smith, J.; Lam, M.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl791

ANNIE: integrated de novo protein sequence annotation
journal, April 2009

  • Ooi, H. S.; Kwo, C. Y.; Wildpaner, M.
  • Nucleic Acids Research, Vol. 37, Issue Web Server
  • DOI: 10.1093/nar/gkp254

Cloud computing and the DNA data race
journal, July 2010

  • Schatz, Michael C.; Langmead, Ben; Salzberg, Steven L.
  • Nature Biotechnology, Vol. 28, Issue 7
  • DOI: 10.1038/nbt0710-691

Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013

  • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1076

STRING v9.1: protein-protein interaction networks, with increased coverage and integration
journal, November 2012

  • Franceschini, Andrea; Szklarczyk, Damian; Frankild, Sune
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1094

Optimizing high performance computing workflow for protein functional annotation: HPC FOR PROTEIN ANNOTATION
journal, April 2014

  • Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 13
  • DOI: 10.1002/cpe.3264

BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

SignalP 4.0: discriminating signal peptides from transmembrane regions
journal, September 2011

  • Petersen, Thomas Nordahl; Brunak, Søren; von Heijne, Gunnar
  • Nature Methods, Vol. 8, Issue 10
  • DOI: 10.1038/nmeth.1701

The Earth Microbiome project: successes and aspirations
journal, August 2014


ASAP: automated sequence annotation pipeline for web-based updating of sequence information with a local dynamic database
journal, March 2003


MESSA: MEta-Server for protein Sequence Analysis
journal, October 2012


Draft Genome Sequence of the Naphthalene Degrader Herbaspirillum sp. Strain RV1423
journal, March 2014


The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)
journal, November 2013

  • Overbeek, Ross; Olson, Robert; Pusch, Gordon D.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1226

IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013

  • Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt963

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
journal, November 2013

  • Caspi, Ron; Altman, Tomer; Billington, Richard
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1103

Towards the integration, annotation and association of historical microarray experiments with RNA-seq
journal, January 2013

  • Chavan, Shweta S.; Bauer, Michael A.; Peterson, Erich A.
  • BMC Bioinformatics, Vol. 14, Issue Suppl 14
  • DOI: 10.1186/1471-2105-14-S14-S4

The RAST Server: Rapid Annotations using Subsystems Technology
journal, January 2008

  • Aziz, Ramy K.; Bartels, Daniela; Best, Aaron A.
  • BMC Genomics, Vol. 9, Issue 1, Article No. 75
  • DOI: 10.1186/1471-2164-9-75

EC2KEGG: a command line tool for comparison of metabolic pathways
journal, September 2014