skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data

Journal Article · · BMC Bioinformatics
 [1];  [2];  [3];  [4]; ORCiD logo [5]
  1. Swiss Inst. of Bioformatics, Lausanne (Switzerland); Centre Medical Universitaire, Geneva (Switzerland)
  2. Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Univ. of Utah, Salt Lake City, UT (United States)
  3. Swiss Inst. of Bioformatics, Lausanne (Switzerland); Univ. of Geneva (Switzerland)
  4. Centre Medical Universitaire, Geneva (Switzerland); Univ. of Geneva (Switzerland)
  5. Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Background: Metagenomics and metatranscriptomics studies provide valuable insight into the composition and function of microbial populations from diverse environments, however the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers. Results: Here we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome and metatranscriptome data. Genome-centric resolution and abundance estimates are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license. Conclusions: ATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome and metatranscriptome data processing; it is easily installable with conda and maintained as open-source on GitHub at https://github.com/metagenome-atlas/atlas.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1646625
Report Number(s):
PNNL-SA-143152
Journal Information:
BMC Bioinformatics, Vol. 21, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 46 works
Citation information provided by
Web of Science

References (54)

Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy journal May 2018
Anvi’o: an advanced analysis and visualization platform for ‘omics data journal January 2015
Versatile and open software for comparing large genomes journal January 2004
Dispersing misconceptions and identifying opportunities for the use of 'omics' in soil microbial ecology journal June 2015
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses journal November 2018
De novo meta-assembly of ultra-deep sequencing data journal June 2015
Functional Gut Microbiota Remodeling Contributes to the Caloric Restriction-Induced Metabolic Improvements journal December 2018
IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses journal December 2016
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software journal October 2017
Species-level functional profiling of metagenomes and metatranscriptomes journal October 2018
Metagenomic species profiling using universal phylogenetic marker genes journal October 2013
Toward Accurate and Quantitative Comparative Metagenomics journal August 2016
Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper journal April 2017
Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen journal February 2018
Clustering huge protein sequence sets in linear time journal June 2018
dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication journal July 2017
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph journal January 2015
Preservation Methods Differ in Fecal Microbiome Stability, Affecting Suitability for Field Studies journal May 2016
Snakemake--a scalable bioinformatics workflow engine journal August 2012
Bioconda: sustainable and comprehensive software distribution for the life sciences journal July 2018
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life journal September 2017
metaSPAdes: a new versatile metagenomic assembler journal March 2017
Towards standards for human fecal sample processing in metagenomic studies journal October 2017
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets journal October 2015
A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life journal August 2018
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies journal January 2019
A new genomic blueprint of the human gut microbiota journal February 2019
IMG/M: integrated genome and metagenome comparative data analysis system journal October 2016
MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices journal June 2016
Mash: fast genome and metagenome distance estimation using MinHash journal June 2016
Optimization of fecal sample processing for microbiome study — The journey from bathroom to bench journal February 2019
MOCAT2: a metagenomic assembly, annotation and profiling framework journal April 2016
Diversity and composition of the North Sikkim hot spring mycobiome using a culture-independent method journal March 2021
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads journal December 2020
Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes journal April 2016
Database resources of the National Center for Biotechnology Information journal January 2006
eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences journal November 2015
Draft Genome Sequence of Medusavirus Stheno, Isolated from the Tatakai River of Uji, Japan journal January 2021
Polinton-like viruses are abundant in aquatic ecosystems journal January 2021
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit journal October 2012
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities journal January 2015
Fast and sensitive protein alignment using DIAMOND journal November 2014
Database resources of the National Center for Biotechnology Information journal October 2020
Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT journal October 2019
Snakemake—a scalable bioinformatics workflow engine journal May 2018
Database resources of the National Center for Biotechnology Information journal December 2007
Database resources of the National Center for Biotechnology Information journal November 2018
eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences text January 2016
Critical Assessment of Metagenome Interpretation - A benchmark of metagenomics software text January 2017
MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph preprint January 2014
Substantial expansion of the human gut microbiota genome catalogue journal March 2019