skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton

Journal Article · · mBio (Online)
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3];  [1]; ORCiD logo [4];  [5]; ORCiD logo [6];
  1. Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
  2. Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
  3. Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA, MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, Massachusetts, USA
  4. Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
  5. Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
  6. Population Health and Reproduction, University of California, Davis, Davis, California, USA

ABSTRACT Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8–2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life. IMPORTANCE Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers’ efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0020347; OCE-1924492; OCE-0939654
OSTI ID:
2205379
Journal Information:
mBio (Online), Journal Name: mBio (Online) Vol. 14 Journal Issue: 6; ISSN 2150-7511
Publisher:
American Society for MicrobiologyCopyright Statement
Country of Publication:
United States
Language:
English

References (131)

Microbiome of Trichodesmium Colonies from the North Pacific Subtropical Gyre journal July 2017
Metabolic diversity within the globally abundant Marine Group II Euryarchaea offers insight into ecological patterns journal January 2019
Strain-specific transcriptional responses overshadow salinity effects in a marine diatom sampled along the Baltic Sea salinity cline journal April 2022
QUAST: quality assessment tool for genome assemblies journal February 2013
Gene-based predictive models of trophic modes suggest Asgard archaea are not phagocytotic journal February 2018
MMseqs2 desktop and local web server app for fast, interactive sequence searches journal January 2019
MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics journal April 2020
Evaluation of variable selection methods for random forests and omics data sets journal October 2017
A global ocean atlas of eukaryotic genes journal January 2018
The Mixoplankton Database (MDB): Diversity of photo‐phago‐trophic plankton in form, function, and distribution across the global ocean journal April 2023
Fluid dynamical niches of phytoplankton types journal October 2010
Selection Maintains Low Genomic GC Content in Marine SAR11 Lineages journal June 2015
Bio-GO-SHIP: The Time Is Right to Establish Global Repeat Sections of Ocean Biology journal January 2022
Open science resources for the discovery and analysis of Tara Oceans data journal May 2015
Shifting metabolic priorities among key protistan taxa within and below the euphotic zone: Depth-related protistan metatranscriptomes journal July 2018
Intracellular pathogens go extreme: genome evolution in the Rickettsiales journal October 2007
Biopython: freely available Python tools for computational molecular biology and bioinformatics journal March 2009
To Dereplicate or Not To Dereplicate? journal June 2020
A Machine Learning Approach for Identifying Gene Biomarkers Guiding the Treatment of Breast Cancer journal March 2019
GToTree: a user-friendly workflow for phylogenomics journal March 2019
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
Rectangular Confidence Regions for the Means of Multivariate Normal Distributions journal June 1967
Marine microbial metagenomes sampled across space and time journal September 2018
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph journal January 2015
GFF Utilities: GffRead and GffCompare journal September 2020
sourmash: a library for MinHash sketching of DNA journal September 2016
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
Marine subsurface eukaryotes: the fungal majority journal August 2010
OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs journal November 2018
Global marine phytoplankton revealed by the Tara Oceans expedition book December 2021
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes journal May 2015
Fast and accurate long-read alignment with Burrows–Wheeler transform journal January 2010
Environmental context of all samples from the Tara Oceans Expedition (2009-2013), about water column features dataset January 2016
The Whole Genome Sequence and mRNA Transcriptome of the Tropical Cyclopoid Copepod Apocyclops royi journal May 2019
Global Structuring of Phylogenetic and Functional Diversity of Pelagic Fungi by Depth and Temperature journal March 2019
MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm journal August 2014
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning journal January 2002
Mash: fast genome and metagenome distance estimation using MinHash journal June 2016
The role of mixotrophic protists in the biological carbon pump journal January 2014
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins journal May 2020
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies journal January 2014
Machine learning: A powerful tool for gene function prediction in plants journal July 2020
The MAR databases: development and implementation of databases specific for marine metagenomics journal November 2017
MUMmer4: A fast and versatile genome alignment system journal January 2018
EUKulele: Taxonomic annotation of the unsung eukaryotic microbes journal January 2021
GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database journal November 2019
Global abundance of planktonic heterotrophic protists in the deep ocean journal October 2014
Microbial Ecology of Ocean Biogeochemistry: A Community Perspective journal May 2008
Binning metagenomic contigs by coverage and composition journal September 2014
Toward understanding the origin and evolution of cellular organisms journal September 2019
Marine Protists Are Not Just Big Bacteria journal June 2017
Mechanisms of Francisella tularensis Intracellular Pathogenesis journal April 2013
Gene identification in novel eukaryotic genomes by self-training algorithm journal November 2005
Genomic adaptations in information processing underpin trophic strategy in a whole-ecosystem nutrient enrichment experiment journal January 2020
Implications of streamlining theory for microbial ecology journal April 2014
Charting the Complexity of the Marine Microbiome through Single-Cell Genomics journal December 2019
dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication journal July 2017
RepeatModeler2 for automated genomic discovery of transposable element families journal April 2020
Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes posted_content September 2018
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life journal September 2017
Mixotrophic protists and a new paradigm for marine ecology: where does plankton research go now? journal July 2019
The eukaryome: Diversity and role of microeukaryotic organisms associated with animal hosts journal December 2019
No evidence of Phago‐mixotropy in Micromonas polaris (Mamiellophyceae), the Dominant Picophytoplankton Species in the Arctic journal March 2021
Genome-reconstruction for eukaryotes from complex natural microbial communities journal March 2018
Pan genome of the phytoplankton Emiliania underpins its global distribution journal June 2013
The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics journal February 2022
Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly posted_content April 2022
MUSCLE: multiple sequence alignment with high accuracy and high throughput journal March 2004
Pfam: the protein families database journal November 2013
Hyperparameters and tuning strategies for random forest
  • Probst, Philipp; Wright, Marvin N.; Boulesteix, Anne‐Laure
  • Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 9, Issue 3 https://doi.org/10.1002/widm.1301
journal November 2018
Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean journal June 2013
The New Tree of Eukaryotes journal January 2020
Processes and patterns of oceanic nutrient limitation journal March 2013
Finishing the euchromatic sequence of the human genome journal October 2004
Insights and achievements from the Tara Pacific expedition journal June 2023
Molecular identification of a novel intracellular proteobacteria from scallop Chlamys farreri journal June 2021
Soil eukaryotic functional diversity, a metatranscriptomic approach journal September 2007
Gene expression characterizes different nutritional strategies among three mixotrophic protists journal May 2016
The high-throughput gene prediction of more than 1,700 eukaryote genomes using the software package EukMetaSanity preprint July 2021
A phylogenomic and ecological analysis of the globally abundant Marine Group II archaea (Ca. Poseidoniales ord. nov.) journal October 2018
Fast unfolding of communities in large networks journal October 2008
Accelerated evolution associated with genome reduction in a free-living prokaryote journal January 2005
Clustering huge protein sequence sets in linear time journal June 2018
The Phaeodactylum genome reveals the evolutionary history of diatom genomes journal October 2008
Snakemake--a scalable bioinformatics workflow engine journal August 2012
Hypotheses on the role of the protistan rare biosphere in a changing world journal November 2009
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs journal June 2015
The diversity of fungal genome journal April 2015
Are Human Intestinal Eukaryotes Beneficial or Commensals? journal August 2015
Trophic strategies explain the ocean niches of small eukaryotic phytoplankton journal January 2023
MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs) journal November 2018
The Genome of Cardinium cBtQ1 Provides Insights into Genome Reduction, Symbiont Motility, and Its Settlement in Bemisia tabaci journal April 2014
Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean journal May 2022
The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in the Oceans through Transcriptome Sequencing journal June 2014
A new genomic blueprint of the human gut microbiota journal February 2019
A computationally fast variable importance test for random forests for high-dimensional data journal November 2016
Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes journal September 2009
EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes journal September 2022
The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis journal September 2010
KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold posted_content April 2019
Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC journal September 2020
Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees journal April 2016
Metagenome-assembled genomes of phytoplankton microbiomes from the Arctic and Atlantic Oceans journal April 2022
Probing the evolution, ecology and physiology of marine protists using transcriptomics journal November 2016
Phytoplankton Community Structure and the Drawdown of Nutrients and CO2 in the Southern Ocean journal January 1999
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation journal January 2017
Tara Oceans: towards global ocean ecosystems biology journal May 2020
Accelerated Profile HMM Searches journal October 2011
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses journal June 2009
Large variability of bathypelagic microbial eukaryotic communities across the world’s oceans journal October 2015
Molecular insights into a dinoflagellate bloom journal December 2016
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Metagenome assembled genomes are for eukaryotes too journal May 2022
Expression of genes involved in phagocytosis in uncultured heterotrophic flagellates journal November 2019
ranger : A Fast Implementation of Random Forests for High Dimensional Data in C++ and R journal January 2017
Functional group-specific traits drive phytoplankton dynamics in the oligotrophic ocean journal October 2015
Metagenomic insights into zooplankton-associated bacterial communities: Zooplankton-associated bacterial communities journal October 2017
A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies journal March 2011
Metagenomic analysis reveals global-scale patterns of ocean nutrient limitation journal April 2021
Marine Protistan Diversity journal January 2012
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies journal January 2019
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans journal January 2018
Codon usage bias from tRNA's point of view: Redundancy, specialization, and efficient decoding for translation optimization journal November 2004
Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes journal June 2018
Rethinking the marine carbon cycle: Factoring in the multifarious lifestyles of microbes journal February 2015
Eukaryotic plankton diversity in the sunlit ocean journal May 2015
Mixotrophy in the Marine Plankton journal January 2017
Random Forests journal January 2001
Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples journal August 2012

Related Subjects