Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Missing microbial eukaryotes and misleading meta-omic conclusions

Journal Article · · Nature Communications
 [1];  [2];  [3];  [4];  [5];  [6];  [7];  [8]
  1. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Woods Hole Oceanographic Institution, Woods Hole, MA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint Genome Institute
  2. Woods Hole Oceanographic Institution, Woods Hole, MA (United States); University of South Florida, St. Petersburg, FL (United States)
  3. Texas A & M University, College Station, TX (United States)
  4. University of Georgia, Savannah, GA (United States)
  5. University of Rhode Island, Narragansett, RI (United States)
  6. Massachusetts Institute of Technology (MIT), Cambridge, MA (United States)
  7. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint Genome Institute
  8. Woods Hole Oceanographic Institution, Woods Hole, MA (United States)
Meta-omics is commonly used for large-scale analyses of microbial eukaryotes, including species or taxonomic group distribution mapping, gene catalog construction, and inference on the functional roles and activities of microbial eukaryotes in situ. Here, we explore the potential pitfalls of common approaches to taxonomic annotation of protistan meta-omic datasets. We re-analyze three environmental datasets at three levels of taxonomic hierarchy in order to illustrate the crucial importance of database completeness and curation in enabling accurate environmental interpretation. We show that taxonomic membership of sequence clusters estimates community composition more accurately than returning exact sequence labels, and overlap between clusters can address database shortcomings. Clustering approaches can be applied to diverse environments while continuing to exploit the wealth of annotation data collated in databases, and selecting and evaluating these databases is a critical part of correctly annotating protistan taxonomy in environmental datasets. We argue that ongoing curation of genetic resources is crucial in accurately annotating protists in in situ meta-omic datasets. Moreover, we propose that precise taxonomic annotation of meta-omic data is a clustering problem rather than a feasible alignment problem.
Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Simons Foundation; USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231; SC0020347
OSTI ID:
2477448
Alternate ID(s):
OSTI ID: 2530313
Journal Information:
Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 15; ISSN 2041-1723
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United States
Language:
English

References (82)

ggplot2 book January 2009
A taxonomic review of the genus Phaeocystis book January 2007
The life cycle of Phaeocystis: state of knowledge and presumptive role in ecology journal April 2007
Basic local alignment search tool journal October 1990
Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics journal July 2015
Marine Protists Are Not Just Big Bacteria journal June 2017
Evaluating the potential of direct RNA nanopore sequencing: Metatranscriptomics highlights possible seasonal differences in a marine pelagic crustacean zooplankton community journal January 2020
Structure-guided metagenome mining to tap microbial functional diversity journal December 2023
Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean journal May 2022
Metagenome assembled genomes are for eukaryotes too journal May 2022
Significance of predation by protists in aquatic microbial food webs journal March 2002
Influence of nutrients and currents on the genomic composition of microbes across an upwelling mosaic journal January 2012
Genomes and gene expression across light and productivity gradients in eastern subtropical Pacific microbial communities journal October 2014
Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega journal January 2011
Full-length transcriptome assembly from RNA-Seq data without a reference genome journal May 2011
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets journal October 2017
Probing the evolution, ecology and physiology of marine protists using transcriptomics journal November 2016
Microeukaryote metabolism across the western North Atlantic Ocean revealed through autonomous underwater profiling journal August 2024
Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes journal June 2018
Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean journal November 2021
Tara Oceans: towards global ocean ecosystems biology journal May 2020
SciPy 1.0: fundamental algorithms for scientific computing in Python journal February 2020
Sensitive protein alignments at tree-of-life scale using DIAMOND journal April 2021
Nanopore sequencing improves the draft genome of the human pathogenic amoeba Naegleria fowleri journal November 2019
Flexible protein database based on amino acid k-mers journal June 2022
MORPHOLOGICAL AND GENETIC CHARACTERIZATION OF PHAEOCYSTIS CORDATA AND P. JAHNII (PRYMNESIOPHYCEAE), TWO NEW SPECIES FROM THE MEDITERRANEAN SEA journal December 1999
An original mode of symbiosis in open ocean plankton journal October 2012
Metatranscriptome analyses indicate resource partitioning between diatoms in the field journal April 2015
Cytoklepty in the plankton: A host strategy to optimize the bioenergetic machinery of endosymbiotic algae journal July 2021
The dynamic trophic architecture of open-ocean protist communities revealed through machine-guided metatranscriptomics journal February 2022
Tiara: deep learning-based classification system for eukaryotic sequences journal September 2021
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences journal May 2006
UniRef: comprehensive and non-redundant UniProt reference clusters journal March 2007
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses journal June 2009
NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads journal November 2010
Trimmomatic: a flexible trimmer for Illumina sequence data journal April 2014
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph journal January 2015
Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes journal December 2018
rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data journal September 2019
Genomes OnLine Database (GOLD) v.8: overview and updates journal November 2020
The MAR databases: development and implementation of databases specific for marine metagenomics journal November 2017
The Ocean Gene Atlas: exploring the biogeography of plankton genes online journal May 2018
IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes journal October 2018
How marine currents and environment shape plankton genomic differentiation: a mosaic view from Tara Oceans metagenomic data preprint April 2021
metaSPAdes: a new versatile metagenomic assembler journal March 2017
Genome-reconstruction for eukaryotes from complex natural microbial communities journal March 2018
Genomic differentiation of three pico‐phytoplankton species in the Mediterranean Sea journal August 2022
DNA metabarcoding focused on difficult‐to‐culture protists: An effective approach to clarify biological interactions journal October 2023
A metagenomic assessment of microbial eukaryotic diversity in the global ocean journal March 2020
Differential Gene Expression Supports a Resource‐Intensive, Defensive Role for Colony Production in the Bloom‐Forming Haptophyte, Phaeocystis globosa journal March 2019
Rethinking the marine carbon cycle: Factoring in the multifarious lifestyles of microbes journal February 2015
Spatiotemporal Variations in Antarctic Protistan Communities Highlight Phytoplankton Diversity and Seasonal Dominance by a Novel Cryptophyte Lineage journal December 2021
mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking journal October 2016
Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton journal December 2023
Efficient string matching: an aid to bibliographic search journal June 1975
Marine Protistan Diversity journal January 2012
Metagenome Fragment Classification Using -Mer Frequency Profiles journal January 2008
Prodigal: prokaryotic gene recognition and translation initiation site identification journal March 2010
Kraken: ultrafast metagenomic sequence classification using exact alignments journal January 2014
Decontaminating eukaryotic genome assemblies with machine learning journal December 2017
Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly journal March 2023
OrthoFinder: phylogenetic orthology inference for comparative genomics journal November 2019
Improved metagenomic analysis with Kraken 2 journal November 2019
Mycofier: a new machine learning-based classifier for fungal ITS sequences journal August 2016
MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics journal April 2020
Structure and function of the Arctic and Antarctic marine microbiota as revealed by metagenomics journal April 2020
Evaluating metagenomic assembly approaches for biome-specific gene catalogues journal May 2022
The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in the Oceans through Transcriptome Sequencing journal June 2014
FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments journal March 2010
General Patterns of Diversity in Major Marine Microeukaryote Lineages journal February 2013
sourmash: a library for MinHash sketching of DNA journal September 2016
EUKulele: Taxonomic annotation of the unsung eukaryotic microbes journal January 2021
EukProt: A database of genome-scale predicted proteins across the diversity of eukaryotes journal September 2022
Population genetics: the next stop for microbial ecologists? journal November 2011
ggmap: Spatial Visualization with ggplot2 journal January 2013
Marine Microeukaryote Metatranscriptomics: Sample Processing and Bioinformatic Workflow Recommendations for Ecological Applications journal June 2022
Diel-Regulated Transcriptional Cascades of Microbial Eukaryotes in the North Pacific Subtropical Gyre journal September 2021
First Draft Genome of the Trypanosomatid Herpetomonas muscarum ingenoplastis through MinION Oxford Nanopore Technology and Illumina Sequencing journal February 2020
Recent Advances in Application of Transcriptomics: Research on Heterotrophic and Autotrophic Protists journal January 2022
EukZoo, an aquatic protistan protein database for meta-omics studies. dataset January 2018
Taxonomic annotation errors incorrectly assign the family Pseudoalteromonadaceae to the order Vibrionales in Greengenes: implications for microbial community assessments journal January 2018
Evolutionary history of dimethylsulfoniopropionate (DMSP) demethylation enzyme DmdA in marine bacteria journal September 2020

Similar Records

Communities of microbial eukaryotes in the mammalian gut within the context of environmental eukaryotic diversity
Journal Article · Thu Jun 19 00:00:00 EDT 2014 · Frontiers in Microbiology · OSTI ID:1392592

Improvement of eukaryotic protein predictions from soil metagenomes
Journal Article · Wed Jun 15 20:00:00 EDT 2022 · Scientific Data · OSTI ID:1904105

Related Subjects