skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

Abstract

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriatelymore » applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

Authors:
 [1];  [2];  [3];  [1];  [4];  [2]
  1. Lawrence Livermore National Security, Livermore, CA (United States); Personalis, Menlo Park, CA (United States)
  2. Lawrence Livermore National Security, Livermore, CA (United States)
  3. Lawrence Livermore National Security, Livermore, CA (United States); Capella Biosciences, Palo Alto, CA (United States)
  4. Lawrence Livermore National Security, Livermore, CA (United States); LinkedIn, Mountain View, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Security
Sponsoring Org.:
USDOE
OSTI Identifier:
1238774
Grant/Contract Number:  
AC52-07NA27344; PE0603384BP-B0946791; SCW1039
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 17; Journal Issue: 1; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING

Citation Formats

Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, and Zhou, Carol L. Ecale. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations. United States: N. p., 2016. Web. doi:10.1186/s12859-016-0887-y.
Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, & Zhou, Carol L. Ecale. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations. United States. doi:10.1186/s12859-016-0887-y.
Leung, Elo, Huang, Amy, Cadag, Eithon, Montana, Aldrin, Soliman, Jan Lorenz, and Zhou, Carol L. Ecale. Wed . "Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations". United States. doi:10.1186/s12859-016-0887-y. https://www.osti.gov/servlets/purl/1238774.
@article{osti_1238774,
title = {Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations},
author = {Leung, Elo and Huang, Amy and Cadag, Eithon and Montana, Aldrin and Soliman, Jan Lorenz and Zhou, Carol L. Ecale},
abstractNote = {In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.},
doi = {10.1186/s12859-016-0887-y},
journal = {BMC Bioinformatics},
number = 1,
volume = 17,
place = {United States},
year = {2016},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 3 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

InterProScan 5: genome-scale protein function classification
journal, January 2014


iGepros: an integrated gene and protein annotation server for biological nature exploration
journal, December 2011


Combination of degradation pathways for naphthalene utilization in R hodococcus sp. strain TFB : Naphthalene degradation in
journal, December 2013

  • Tomás-Gallardo, Laura; Gómez-Álvarez, Helena; Santero, Eduardo
  • Microbial Biotechnology, Vol. 7, Issue 2
  • DOI: 10.1111/1751-7915.12096

What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins
journal, April 2014


EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes
journal, August 2012


Unraveling the Complexities of Life Sciences Data
journal, March 2013

  • Higdon, Roger; Haynes, Winston; Stanberry, Larissa
  • Big Data, Vol. 1, Issue 1
  • DOI: 10.1089/big.2012.1505

The IGS Standard Operating Procedure for Automated Prokaryotic Annotation
journal, April 2011

  • Galens, Kevin; Orvis, Joshua; Daugherty, Sean
  • Standards in Genomic Sciences, Vol. 4, Issue 2
  • DOI: 10.4056/sigs.1223234

MvirDB--a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications
journal, January 2007

  • Zhou, C. E.; Smith, J.; Lam, M.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl791

Cloud computing and the DNA data race
journal, July 2010

  • Schatz, Michael C.; Langmead, Ben; Salzberg, Steven L.
  • Nature Biotechnology, Vol. 28, Issue 7
  • DOI: 10.1038/nbt0710-691

Data, information, knowledge and principle: back to metabolism in KEGG
journal, November 2013

  • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1076

STRING v9.1: protein-protein interaction networks, with increased coverage and integration
journal, November 2012

  • Franceschini, Andrea; Szklarczyk, Damian; Frankild, Sune
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1094

Optimizing high performance computing workflow for protein functional annotation: HPC FOR PROTEIN ANNOTATION
journal, April 2014

  • Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan
  • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 13
  • DOI: 10.1002/cpe.3264

BLAST+: architecture and applications
journal, January 2009

  • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-421

SignalP 4.0: discriminating signal peptides from transmembrane regions
journal, September 2011

  • Petersen, Thomas Nordahl; Brunak, Søren; von Heijne, Gunnar
  • Nature Methods, Vol. 8, Issue 10
  • DOI: 10.1038/nmeth.1701

The Earth Microbiome project: successes and aspirations
journal, August 2014


ASAP: automated sequence annotation pipeline for web-based updating of sequence information with a local dynamic database
journal, March 2003


MESSA: MEta-Server for protein Sequence Analysis
journal, October 2012


Draft Genome Sequence of the Naphthalene Degrader Herbaspirillum sp. Strain RV1423
journal, March 2014


The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)
journal, November 2013

  • Overbeek, Ross; Olson, Robert; Pusch, Gordon D.
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1226

IMG 4 version of the integrated microbial genomes comparative analysis system
journal, October 2013

  • Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt963

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
journal, November 2013

  • Caspi, Ron; Altman, Tomer; Billington, Richard
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1103

Towards the integration, annotation and association of historical microarray experiments with RNA-seq
journal, January 2013

  • Chavan, Shweta S.; Bauer, Michael A.; Peterson, Erich A.
  • BMC Bioinformatics, Vol. 14, Issue Suppl 14
  • DOI: 10.1186/1471-2105-14-S14-S4

The RAST Server: Rapid Annotations using Subsystems Technology
journal, January 2008

  • Aziz, Ramy K.; Bartels, Daniela; Best, Aaron A.
  • BMC Genomics, Vol. 9, Issue 1, Article No. 75
  • DOI: 10.1186/1471-2164-9-75

EC2KEGG: a command line tool for comparison of metabolic pathways
journal, September 2014


    Works referencing / citing this record:

    Optimizing high performance computing workflow for protein functional annotation: HPC FOR PROTEIN ANNOTATION
    journal, April 2014

    • Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan
    • Concurrency and Computation: Practice and Experience, Vol. 26, Issue 13
    • DOI: 10.1002/cpe.3264

    Cloud computing and the DNA data race
    journal, July 2010

    • Schatz, Michael C.; Langmead, Ben; Salzberg, Steven L.
    • Nature Biotechnology, Vol. 28, Issue 7
    • DOI: 10.1038/nbt0710-691

    SignalP 4.0: discriminating signal peptides from transmembrane regions
    journal, September 2011

    • Petersen, Thomas Nordahl; Brunak, Søren; von Heijne, Gunnar
    • Nature Methods, Vol. 8, Issue 10
    • DOI: 10.1038/nmeth.1701

    Unraveling the Complexities of Life Sciences Data
    journal, March 2013

    • Higdon, Roger; Haynes, Winston; Stanberry, Larissa
    • Big Data, Vol. 1, Issue 1
    • DOI: 10.1089/big.2012.1505

    ASAP: automated sequence annotation pipeline for web-based updating of sequence information with a local dynamic database
    journal, March 2003


    EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes
    journal, August 2012


    InterProScan 5: genome-scale protein function classification
    journal, January 2014


    MvirDB--a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications
    journal, January 2007

    • Zhou, C. E.; Smith, J.; Lam, M.
    • Nucleic Acids Research, Vol. 35, Issue Database
    • DOI: 10.1093/nar/gkl791

    STRING v9.1: protein-protein interaction networks, with increased coverage and integration
    journal, November 2012

    • Franceschini, Andrea; Szklarczyk, Damian; Frankild, Sune
    • Nucleic Acids Research, Vol. 41, Issue D1
    • DOI: 10.1093/nar/gks1094

    Data, information, knowledge and principle: back to metabolism in KEGG
    journal, November 2013

    • Kanehisa, Minoru; Goto, Susumu; Sato, Yoko
    • Nucleic Acids Research, Vol. 42, Issue D1
    • DOI: 10.1093/nar/gkt1076

    The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
    journal, November 2013

    • Caspi, Ron; Altman, Tomer; Billington, Richard
    • Nucleic Acids Research, Vol. 42, Issue D1
    • DOI: 10.1093/nar/gkt1103

    The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)
    journal, November 2013

    • Overbeek, Ross; Olson, Robert; Pusch, Gordon D.
    • Nucleic Acids Research, Vol. 42, Issue D1
    • DOI: 10.1093/nar/gkt1226

    IMG 4 version of the integrated microbial genomes comparative analysis system
    journal, October 2013

    • Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna
    • Nucleic Acids Research, Vol. 42, Issue D1
    • DOI: 10.1093/nar/gkt963

    Combination of degradation pathways for naphthalene utilization in R hodococcus sp. strain TFB : Naphthalene degradation in
    journal, December 2013

    • Tomás-Gallardo, Laura; Gómez-Álvarez, Helena; Santero, Eduardo
    • Microbial Biotechnology, Vol. 7, Issue 2
    • DOI: 10.1111/1751-7915.12096

    BLAST+: architecture and applications
    journal, January 2009

    • Camacho, Christiam; Coulouris, George; Avagyan, Vahram
    • BMC Bioinformatics, Vol. 10, Issue 1
    • DOI: 10.1186/1471-2105-10-421

    The RAST Server: Rapid Annotations using Subsystems Technology
    journal, January 2008

    • Aziz, Ramy K.; Bartels, Daniela; Best, Aaron A.
    • BMC Genomics, Vol. 9, Issue 1, Article No. 75
    • DOI: 10.1186/1471-2164-9-75

    MESSA: MEta-Server for protein Sequence Analysis
    journal, October 2012


    EC2KEGG: a command line tool for comparison of metabolic pathways
    journal, September 2014


    The Earth Microbiome project: successes and aspirations
    journal, August 2014


    The IGS Standard Operating Procedure for Automated Prokaryotic Annotation
    journal, April 2011

    • Galens, Kevin; Orvis, Joshua; Daugherty, Sean
    • Standards in Genomic Sciences, Vol. 4, Issue 2
    • DOI: 10.4056/sigs.1223234