skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A RESTful API for accessing microbial community data for MG-RAST

Abstract

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and shouldmore » be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.« less

Authors:
 [1];  [1];  [1];  [2];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [3]
  1. Argonne National Lab. (ANL), Lement, IL (United States). Mathematics and Computer Science Division; Univ. of Chicago, Chicago, IL (United States). Computation Institute.
  2. Argonne National Lab. (ANL), Lement, IL (United States). Mathematics and Computer Science Division.
  3. Univ. of Canterbury (New Zealand)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1212400
Alternate Identifier(s):
OSTI ID: 1395022
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
PLoS Computational Biology (Online)
Additional Journal Information:
Journal Volume: 11; Journal Issue: 1; Journal ID: ISSN 1553-7358
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; sequence databases; information retrieval; metagenomics; web-based applications; proteases; DNA sequence analysis; database searching; quality control

Citation Formats

Wilke, Andreas, Bischof, Jared, Harrison, Travis, Brettin, Tom, D'Souza, Mark, Gerlach, Wolfgang, Matthews, Hunter, Paczian, Tobias, Wilkening, Jared, Glass, Elizabeth M., Desai, Narayan, Meyer, Folker, and Gardner, Paul P. A RESTful API for accessing microbial community data for MG-RAST. United States: N. p., 2015. Web. doi:10.1371/journal.pcbi.1004008.
Wilke, Andreas, Bischof, Jared, Harrison, Travis, Brettin, Tom, D'Souza, Mark, Gerlach, Wolfgang, Matthews, Hunter, Paczian, Tobias, Wilkening, Jared, Glass, Elizabeth M., Desai, Narayan, Meyer, Folker, & Gardner, Paul P. A RESTful API for accessing microbial community data for MG-RAST. United States. https://doi.org/10.1371/journal.pcbi.1004008
Wilke, Andreas, Bischof, Jared, Harrison, Travis, Brettin, Tom, D'Souza, Mark, Gerlach, Wolfgang, Matthews, Hunter, Paczian, Tobias, Wilkening, Jared, Glass, Elizabeth M., Desai, Narayan, Meyer, Folker, and Gardner, Paul P. Thu . "A RESTful API for accessing microbial community data for MG-RAST". United States. https://doi.org/10.1371/journal.pcbi.1004008. https://www.osti.gov/servlets/purl/1212400.
@article{osti_1212400,
title = {A RESTful API for accessing microbial community data for MG-RAST},
author = {Wilke, Andreas and Bischof, Jared and Harrison, Travis and Brettin, Tom and D'Souza, Mark and Gerlach, Wolfgang and Matthews, Hunter and Paczian, Tobias and Wilkening, Jared and Glass, Elizabeth M. and Desai, Narayan and Meyer, Folker and Gardner, Paul P.},
abstractNote = {Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.},
doi = {10.1371/journal.pcbi.1004008},
url = {https://www.osti.gov/biblio/1212400}, journal = {PLoS Computational Biology (Online)},
issn = {1553-7358},
number = 1,
volume = 11,
place = {United States},
year = {2015},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 14 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification
journal, May 2013


A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE
journal, June 2012


The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
journal, January 2012


InterPro in 2011: new developments in the family and domain prediction database
journal, November 2011


Using clouds for metagenomics: A case study
conference, August 2009


Identifying Protein Domains with the Pfam Database
journal, September 2008


Accessing the SEED Genome Databases via Web Services API: Tools for Programmers
journal, January 2010


The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome
journal, July 2012


The 'rare biosphere': a reality check
journal, September 2009


    Works referencing / citing this record:

    SAMSA: a comprehensive metatranscriptome analysis pipeline
    journal, September 2016


    MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis
    journal, September 2017


    MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis
    journal, September 2017


    SAMSA: a comprehensive metatranscriptome analysis pipeline
    journal, September 2016


    A novel and wide substrate specific polyhydroxyalkanoate (PHA) synthase from unculturable bacteria found in mangrove soil
    journal, December 2017


    IgA regulates the composition and metabolic function of gut microbiota by promoting symbiosis between bacteria
    journal, July 2018


    Functional sequencing read annotation for high precision microbiome analysis
    journal, November 2017


    Exploring bacterial pathogen community dynamics in freshwater beach sediments: A tale of two lakes
    journal, November 2019


    Metagenomic evidence for the presence of phototrophic Gemmatimonadetes bacteria in diverse environments: Phototrophic Gemmatimonadetes in diverse environments
    journal, January 2016


    Ancient plant DNA in lake sediments
    journal, April 2017


    What Is the Role of Archaea in Plants? New Insights from the Vegetation of Alpine Bogs
    journal, May 2018


    Genomics of the Uncultivated, Periodontitis-Associated Bacterium Tannerella sp. BU045 (Oral Taxon 808)
    journal, June 2018


    Microscale Biosignatures and Abiotic Mineral Authigenesis in Little Hot Creek, California
    journal, May 2018