DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Ergatis: a web interface and scalable software system for bioinformatics workflows

Abstract

Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports highthroughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects.

Authors:
 [1];  [1];  [1];  [1];  [2];  [3];  [2];  [1];  [4];  [1];  [5];  [1];  [1];  [1];  [6]
  1. Univ. of Maryland Baltimore County (UMBC), Baltimore, MD (United States). Inst. for Genome Sciences
  2. J. Craig Venter Inst., Inc., Rockville, MD (United States)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  4. J. Craig Venter Inst., Inc., Rockville, MD (United States); Georgetown Univ., Washington, DC (United States). Dept. of Biology. Computational Genomics Lab.
  5. Michigan State Univ., East Lansing, MI (United States). Dept. of Plant Biology
  6. Univ. of Maryland Baltimore County (UMBC), Baltimore, MD (United States). Inst. for Genome Sciences; Univ. of Maryland, College Park, MD (United States). Center for Bioinformatics and Computational Biology
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
OSTI Identifier:
1625266
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Bioinformatics
Additional Journal Information:
Journal Volume: 26; Journal Issue: 12; Journal ID: ISSN 1367-4803
Publisher:
International Society for Computational Biology - Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Biochemistry & Molecular Biology; Biotechnology & Applied Microbiology; Computer Science; Mathematical & Computational Biology; Mathematics

Citation Formats

Orvis, Joshua, Crabtree, Jonathan, Galens, Kevin, Gussman, Aaron, Inman, Jason M., Lee, Eduardo, Nampally, Sreenath, Riley, David, Sundaram, Jaideep P., Felix, Victor, Whitty, Brett, Mahurkar, Anup, Wortman, Jennifer, White, Owen, and Angiuoli, Samuel V. Ergatis: a web interface and scalable software system for bioinformatics workflows. United States: N. p., 2010. Web. doi:10.1093/bioinformatics/btq167.
Orvis, Joshua, Crabtree, Jonathan, Galens, Kevin, Gussman, Aaron, Inman, Jason M., Lee, Eduardo, Nampally, Sreenath, Riley, David, Sundaram, Jaideep P., Felix, Victor, Whitty, Brett, Mahurkar, Anup, Wortman, Jennifer, White, Owen, & Angiuoli, Samuel V. Ergatis: a web interface and scalable software system for bioinformatics workflows. United States. https://doi.org/10.1093/bioinformatics/btq167
Orvis, Joshua, Crabtree, Jonathan, Galens, Kevin, Gussman, Aaron, Inman, Jason M., Lee, Eduardo, Nampally, Sreenath, Riley, David, Sundaram, Jaideep P., Felix, Victor, Whitty, Brett, Mahurkar, Anup, Wortman, Jennifer, White, Owen, and Angiuoli, Samuel V. Thu . "Ergatis: a web interface and scalable software system for bioinformatics workflows". United States. https://doi.org/10.1093/bioinformatics/btq167. https://www.osti.gov/servlets/purl/1625266.
@article{osti_1625266,
title = {Ergatis: a web interface and scalable software system for bioinformatics workflows},
author = {Orvis, Joshua and Crabtree, Jonathan and Galens, Kevin and Gussman, Aaron and Inman, Jason M. and Lee, Eduardo and Nampally, Sreenath and Riley, David and Sundaram, Jaideep P. and Felix, Victor and Whitty, Brett and Mahurkar, Anup and Wortman, Jennifer and White, Owen and Angiuoli, Samuel V.},
abstractNote = {Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports highthroughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects.},
doi = {10.1093/bioinformatics/btq167},
journal = {Bioinformatics},
number = 12,
volume = 26,
place = {United States},
year = {Thu Apr 22 00:00:00 EDT 2010},
month = {Thu Apr 22 00:00:00 EDT 2010}
}

Works referenced in this record:

An Ergatis-based prokaryotic genome annotation web server
journal, March 2010


GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses
journal, July 2005

  • Besemer, J.; Borodovsky, M.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki487

Semi-automatic web service composition for the life sciences using the BioMoby semantic web framework
journal, October 2008

  • DiBernardo, Michael; Pottinger, Rachel; Wilkinson, Mark
  • Journal of Biomedical Informatics, Vol. 41, Issue 5
  • DOI: 10.1016/j.jbi.2008.02.005

Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis
journal, January 2007


Genome Sequence of Aedes aegypti, a Major Arbovirus Vector
journal, June 2007


Locating proteins in the cell using TargetP, SignalP and related tools
journal, April 2007

  • Emanuelsson, Olof; Brunak, Søren; von Heijne, Gunnar
  • Nature Protocols, Vol. 2, Issue 4
  • DOI: 10.1038/nprot.2007.131

Automation of in-silico data analysis processes through workflow management systems
journal, October 2007


Comparative Genomics of Emerging Human Ehrlichiosis Agents
journal, January 2006


Taverna: a tool for the composition and enactment of bioinformatics workflows
journal, June 2004


A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007


ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources
journal, January 2001


BioMOBY: An open source biological web services proposal
journal, January 2002


GeneWise and Genomewise
journal, May 2004


Pegasys: software for executing and integrating analyses of biological sequences
journal, January 2004

  • Shah, Sohrab P.; He, David YM; Sawkins, Jessica N.
  • BMC Bioinformatics, Vol. 5, Issue 1, p. 40
  • DOI: 10.1186/1471-2105-5-40

Pathema: a clade-specific bioinformatics resource center for pathogen research
journal, October 2009

  • Brinkac, Lauren M.; Davidsen, Tanja; Beck, Erin
  • Nucleic Acids Research, Vol. 38, Issue suppl_1
  • DOI: 10.1093/nar/gkp850

BioMOBY: An open source biological web services proposal
journal, January 2002


Workflow based framework for life science informatics
journal, October 2007


Comparative genomics: the bacterial pan-genome
journal, October 2008

  • Tettelin, Hervé; Riley, David; Cattuto, Ciro
  • Current Opinion in Microbiology, Vol. 11, Issue 5
  • DOI: 10.1016/j.mib.2008.09.006

A Chado case study: an ontology-based modular schema for representing genome-associated biological information
journal, July 2007


Comparative Genomics of Trypanosomatid Parasitic Protozoa
journal, July 2005


Sybil: Methods and Software for Multiple Genome Comparison and Visualization
book, January 2007


Comparative Genomics of Emerging Human Ehrlichiosis Agents
journal, January 2006


GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses
journal, July 2005

  • Besemer, J.; Borodovsky, M.
  • Nucleic Acids Research, Vol. 33, Issue Web Server
  • DOI: 10.1093/nar/gki487

Automation of in-silico data analysis processes through workflow management systems
journal, October 2007


Locating proteins in the cell using TargetP, SignalP and related tools
journal, April 2007

  • Emanuelsson, Olof; Brunak, Søren; von Heijne, Gunnar
  • Nature Protocols, Vol. 2, Issue 4
  • DOI: 10.1038/nprot.2007.131

Taverna: a tool for the composition and enactment of bioinformatics workflows
journal, June 2004


Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

The TIGR Rice Genome Annotation Resource: improvements and new features
journal, January 2007

  • Ouyang, S.; Zhu, W.; Hamilton, J.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl976

GeneWise and Genomewise
journal, May 2004


Comparative genomics: the bacterial pan-genome
journal, October 2008

  • Tettelin, Hervé; Riley, David; Cattuto, Ciro
  • Current Opinion in Microbiology, Vol. 11, Issue 5
  • DOI: 10.1016/j.mib.2008.09.006

Pathema: a clade-specific bioinformatics resource center for pathogen research
journal, October 2009

  • Brinkac, Lauren M.; Davidsen, Tanja; Beck, Erin
  • Nucleic Acids Research, Vol. 38, Issue suppl_1
  • DOI: 10.1093/nar/gkp850

Galaxy: A platform for interactive large-scale genome analysis
journal, September 2005


Comparative Genomics of Trypanosomatid Parasitic Protozoa
journal, July 2005


Wildfire: distributed, Grid-enabled workflow construction and execution
journal, January 2005

  • Tang, Francis; Chua, Ching Lian; Ho, Liang-Yoong
  • BMC Bioinformatics, Vol. 6, Issue 1
  • DOI: 10.1186/1471-2105-6-69

Workflow based framework for life science informatics
journal, October 2007


Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis
journal, January 2007


Genome Sequence of Aedes aegypti, a Major Arbovirus Vector
journal, June 2007


Genome Sequence of the Deep-Rooted Yersinia pestis Strain Angola Reveals New Insights into the Evolution and Pangenome of the Plague Bacterium
journal, January 2010

  • Eppinger, Mark; Worsham, Patricia L.; Nikolich, Mikeljon P.
  • Journal of Bacteriology, Vol. 192, Issue 6
  • DOI: 10.1128/JB.01518-09

Works referencing / citing this record:

Draft Genome Sequence of Enterobacter cloacae Strain S611
journal, November 2014


Responses of the Human Gut Escherichia coli Population to Pathogen and Antibiotic Disturbances
journal, August 2018


High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
journal, September 2014


Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations
journal, January 2012


Transcriptional Variation of Diverse Enteropathogenic Escherichia coli Isolates under Virulence-Inducing Conditions
journal, August 2017


Bacterial Endosymbiosis in a Chordate Host: Long-Term Co-Evolution and Conservation of Secondary Metabolism
journal, December 2013


ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data
journal, February 2017


Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases
journal, January 2013


The IGS Standard Operating Procedure for Automated Prokaryotic Annotation
journal, April 2011

  • Galens, Kevin; Orvis, Joshua; Daugherty, Sean
  • Standards in Genomic Sciences, Vol. 4, Issue 2
  • DOI: 10.4056/sigs.1223234

Measuring metagenome diversity and similarity with Hill numbers
journal, July 2018


Jflow: a workflow management system for web applications
journal, October 2015


Using Sybil for interactive comparative genomics of microbes on the web
journal, November 2011


Responses of the Human Gut Escherichia coli Population to Pathogen and Antibiotic Disturbances
journal, August 2018


Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing
journal, October 2011


ReVac: a reverse vaccinology computational pipeline for prioritization of prokaryotic protein vaccine candidates
journal, December 2019


Molgenis-impute: imputation pipeline in a box
journal, August 2015

  • Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk
  • BMC Research Notes, Vol. 8, Issue 1
  • DOI: 10.1186/s13104-015-1309-3

Draft Genome Sequence of Thauera sp. Strain SWB20, Isolated from a Singapore Wastewater Treatment Facility Using Gel Microdroplets
journal, April 2015

  • Dichosa, Armand E. K.; Davenport, Karen W.; Li, Po-E
  • Genome Announcements, Vol. 3, Issue 2
  • DOI: 10.1128/genomea.00132-15

CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline
journal, April 2017


A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
journal, February 2011


Using Sybil for interactive comparative genomics of microbes on the web
journal, November 2011


Draft Genome Sequence of Pseudomonas putida Strain S610, a Seed-Borne Bacterium of Wheat
journal, October 2013


A Case Study for Large-Scale Human Microbiome Analysis Using JCVI’s Metagenomics Reports (METAREP)
journal, June 2012


CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing
journal, August 2011

  • Angiuoli, Samuel V.; Matalka, Malcolm; Gussman, Aaron
  • BMC Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2105-12-356

Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds
journal, August 2012

  • Schönherr, Sebastian; Forer, Lukas; Weißensteiner, Hansi
  • BMC Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2105-13-200

A graph-based approach for designing extensible pipelines
journal, July 2012

  • Rodrigues, Maíra R.; Magalhães, Wagner CS; Machado, Moara
  • BMC Bioinformatics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2105-13-163

Inter- and intra-specific pan-genomes of Borrelia burgdorferi sensu lato: genome stability and adaptive radiation
journal, January 2013

  • Mongodin, Emmanuel F.; Casjens, Sherwood R.; Bruno, John F.
  • BMC Genomics, Vol. 14, Issue 1
  • DOI: 10.1186/1471-2164-14-693

VIROME: a standard operating procedure for analysis of viral metagenome sequences
journal, July 2012

  • Wommack, K. Eric; Bhavsar, Jaysheel; Polson, Shawn W.
  • Standards in Genomic Sciences, Vol. 6, Issue 3
  • DOI: 10.4056/sigs.2945050

CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline
journal, April 2017


Molgenis-impute: imputation pipeline in a box
journal, August 2015

  • Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk
  • BMC Research Notes, Vol. 8, Issue 1
  • DOI: 10.1186/s13104-015-1309-3

Small but Sufficient: the Rhodococcus Phage RRH1 Has the Smallest Known Siphoviridae Genome at 14.2 Kilobases
journal, October 2011

  • Petrovski, S.; Dyson, Z. A.; Seviour, R. J.
  • Journal of Virology, Vol. 86, Issue 1
  • DOI: 10.1128/jvi.05460-11

JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing
journal, August 2015


Strains, functions and dynamics in the expanded Human Microbiome Project
journal, September 2017

  • Lloyd-Price, Jason; Mahurkar, Anup; Rahnavard, Gholamali
  • Nature, Vol. 550, Issue 7674
  • DOI: 10.1038/nature23889

The Wasp System: An open source environment for managing and analyzing genomic data
journal, December 2012


MaPSeq, A Service-Oriented Architecture for Genomics Research within an Academic Biomedical Research Institution
journal, July 2015


Tripal: a construction toolkit for online genome databases
journal, January 2011


CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing
journal, August 2011

  • Angiuoli, Samuel V.; Matalka, Malcolm; Gussman, Aaron
  • BMC Bioinformatics, Vol. 12, Issue 1
  • DOI: 10.1186/1471-2105-12-356

VIROME: a standard operating procedure for analysis of viral metagenome sequences
journal, July 2012

  • Wommack, K. Eric; Bhavsar, Jaysheel; Polson, Shawn W.
  • Standards in Genomic Sciences, Vol. 6, Issue 3
  • DOI: 10.4056/sigs.2945050

CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota
journal, February 2013

  • White, James Robert; Maddox, Cynthia; White, Owen
  • Microbiome, Vol. 1, Issue 1
  • DOI: 10.1186/2049-2618-1-6

Kepler WebView: A Lightweight, Portable Framework for Constructing Real-time Web Interfaces of Scientific Workflows
journal, January 2016


Agile parallel bioinformatics workflow management using Pwrake
journal, September 2011

  • Mishima, Hiroyuki; Sasaki, Kensaku; Tanaka, Masahiro
  • BMC Research Notes, Vol. 4, Issue 1
  • DOI: 10.1186/1756-0500-4-331

The IGS Standard Operating Procedure for Automated Prokaryotic Annotation
journal, April 2011

  • Galens, Kevin; Orvis, Joshua; Daugherty, Sean
  • Standards in Genomic Sciences, Vol. 4, Issue 2
  • DOI: 10.4056/sigs.1223234

I-ATAC: interactive pipeline for the management and pre-processing of ATAC-seq samples
journal, January 2017


Composite mobile genetic elements disseminating macrolide resistance in Streptococcus pneumoniae
journal, February 2015


Computational ecosystems for data-driven medical genomics
journal, January 2010

  • Almeida, Jonas S.
  • Genome Medicine, Vol. 2, Issue 9
  • DOI: 10.1186/gm188

Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme
journal, January 2010

  • Ovaska, Kristian; Laakso, Marko; Haapa-Paananen, Saija
  • Genome Medicine, Vol. 2, Issue 9
  • DOI: 10.1186/gm186

Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data
journal, February 2016


Comparative genomic analysis and molecular examination of the diversity of enterotoxigenic Escherichia coli isolates from Chile
journal, November 2019


I-ATAC: interactive pipeline for the management and pre-processing of ATAC-seq samples
journal, January 2017


NG6: Integrated next generation sequencing storage and processing environment
journal, January 2012

  • Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas
  • BMC Genomics, Vol. 13, Issue 1
  • DOI: 10.1186/1471-2164-13-462

Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data
journal, February 2016


Inter- and intra-specific pan-genomes of Borrelia burgdorferi sensu lato: genome stability and adaptive radiation
journal, January 2013

  • Mongodin, Emmanuel F.; Casjens, Sherwood R.; Bruno, John F.
  • BMC Genomics, Vol. 14, Issue 1
  • DOI: 10.1186/1471-2164-14-693

Blocking Yersiniabactin Import Attenuates Extraintestinal Pathogenic Escherichia coli in Cystitis and Pyelonephritis and Represents a Novel Target To Prevent Urinary Tract Infection
journal, April 2015

  • Brumbaugh, Ariel R.; Smith, Sara N.; Subashchandrabose, Sargurunathan
  • Infection and Immunity, Vol. 83, Issue 4
  • DOI: 10.1128/iai.02904-14

Population dynamics of Escherichia coli in the gastrointestinal tracts of Tanzanian children
posted_content, January 2018

  • Richter, Taylor K. S.; Hazen, Tracy H.; Lam, Diana
  • bioRxiv
  • DOI: 10.1101/294934

Investigating the Relatedness of Enteroinvasive Escherichia coli to Other E. coli and Shigella Isolates by Using Comparative Genomics
journal, June 2016

  • Hazen, Tracy H.; Leonard, Susan R.; Lampel, Keith A.
  • Infection and Immunity, Vol. 84, Issue 8
  • DOI: 10.1128/iai.00350-16

ReVac: a reverse vaccinology computational pipeline for prioritization of prokaryotic protein vaccine candidates
journal, December 2019


Draft Genome Sequences of Two Staphylococcus warneri Clinical Isolates, Strains SMA0023-04 (UGA3) and SMA0670-05 (UGA28), from Siaya County Referral Hospital, Siaya, Kenya
journal, April 2019

  • Xie, Gary; Cheng, Qiuying; Daligault, Hajnalka
  • Microbiology Resource Announcements, Vol. 8, Issue 15
  • DOI: 10.1128/mra.01595-18

Phosphotyrosine-Mediated Regulation of Enterohemorrhagic Escherichia coli Virulence
journal, March 2018


ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data
journal, February 2017


Transcriptome analysis of a Pseudomonas aeruginosa sn-glycerol-3-phosphate dehydrogenase mutant reveals a disruption in bioenergetics
journal, April 2018

  • Shuman, Jon; Giles, Tyler Xavier; Carroll, Leslie
  • Microbiology, Vol. 164, Issue 4
  • DOI: 10.1099/mic.0.000646

SMITH: a LIMS for handling next-generation sequencing workflows
journal, November 2014


Draft Genome Sequences of Two Staphylococcus warneri Clinical Isolates, Strains SMA0023-04 (UGA3) and SMA0670-05 (UGA28), from Siaya County Referral Hospital, Siaya, Kenya
journal, April 2019

  • Xie, Gary; Cheng, Qiuying; Daligault, Hajnalka
  • Microbiology Resource Announcements, Vol. 8, Issue 15
  • DOI: 10.1128/mra.01595-18

A comparative analysis of library prep approaches for sequencing low input translatome samples
journal, September 2018