skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The metagenomic data life-cycle: standards and best practices

Abstract

Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is still needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.

Authors:
; ; ; ; ORCiD logo; ; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22); European Union - Horizon 2020 Research and Innovation Programme
OSTI Identifier:
1390783
DOE Contract Number:
AC02-06CH11357
Resource Type:
Journal Article
Resource Relation:
Journal Name: GigaScience; Journal Volume: 6; Journal Issue: 8
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Metagenomics; best practice; data analysis; metadata; sampling; sequencing; standard

Citation Formats

ten Hoopen, Petra, Finn, Robert D., Bongo, Lars Ailo, Corre, Erwan, Fosso, Bruno, Meyer, Folker, Mitchell, Alex, Pelletier, Eric, Pesole, Graziano, Santamaria, Monica, Willassen, Nils Peder, and Cochrane, Guy. The metagenomic data life-cycle: standards and best practices. United States: N. p., 2017. Web. doi:10.1093/gigascience/gix047.
ten Hoopen, Petra, Finn, Robert D., Bongo, Lars Ailo, Corre, Erwan, Fosso, Bruno, Meyer, Folker, Mitchell, Alex, Pelletier, Eric, Pesole, Graziano, Santamaria, Monica, Willassen, Nils Peder, & Cochrane, Guy. The metagenomic data life-cycle: standards and best practices. United States. doi:10.1093/gigascience/gix047.
ten Hoopen, Petra, Finn, Robert D., Bongo, Lars Ailo, Corre, Erwan, Fosso, Bruno, Meyer, Folker, Mitchell, Alex, Pelletier, Eric, Pesole, Graziano, Santamaria, Monica, Willassen, Nils Peder, and Cochrane, Guy. Fri . "The metagenomic data life-cycle: standards and best practices". United States. doi:10.1093/gigascience/gix047.
@article{osti_1390783,
title = {The metagenomic data life-cycle: standards and best practices},
author = {ten Hoopen, Petra and Finn, Robert D. and Bongo, Lars Ailo and Corre, Erwan and Fosso, Bruno and Meyer, Folker and Mitchell, Alex and Pelletier, Eric and Pesole, Graziano and Santamaria, Monica and Willassen, Nils Peder and Cochrane, Guy},
abstractNote = {Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is still needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.},
doi = {10.1093/gigascience/gix047},
journal = {GigaScience},
number = 8,
volume = 6,
place = {United States},
year = {Fri Jun 16 00:00:00 EDT 2017},
month = {Fri Jun 16 00:00:00 EDT 2017}
}