Missing microbial eukaryotes and misleading meta-omic conclusions
Journal Article
·
· Nature Communications
- Massachusetts Institute of Technology (MIT), Cambridge, MA (United States); Woods Hole Oceanographic Institution, Woods Hole, MA (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint Genome Institute
- Woods Hole Oceanographic Institution, Woods Hole, MA (United States); University of South Florida, St. Petersburg, FL (United States)
- Texas A & M University, College Station, TX (United States)
- University of Georgia, Savannah, GA (United States)
- University of Rhode Island, Narragansett, RI (United States)
- Massachusetts Institute of Technology (MIT), Cambridge, MA (United States)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint Genome Institute
- Woods Hole Oceanographic Institution, Woods Hole, MA (United States)
Meta-omics is commonly used for large-scale analyses of microbial eukaryotes, including species or taxonomic group distribution mapping, gene catalog construction, and inference on the functional roles and activities of microbial eukaryotes in situ. Here, we explore the potential pitfalls of common approaches to taxonomic annotation of protistan meta-omic datasets. We re-analyze three environmental datasets at three levels of taxonomic hierarchy in order to illustrate the crucial importance of database completeness and curation in enabling accurate environmental interpretation. We show that taxonomic membership of sequence clusters estimates community composition more accurately than returning exact sequence labels, and overlap between clusters can address database shortcomings. Clustering approaches can be applied to diverse environments while continuing to exploit the wealth of annotation data collated in databases, and selecting and evaluating these databases is a critical part of correctly annotating protistan taxonomy in environmental datasets. We argue that ongoing curation of genetic resources is crucial in accurately annotating protists in in situ meta-omic datasets. Moreover, we propose that precise taxonomic annotation of meta-omic data is a clustering problem rather than a feasible alignment problem.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- Simons Foundation; USDOE; USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities (SUF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- AC02-05CH11231; SC0020347
- OSTI ID:
- 2477448
- Alternate ID(s):
- OSTI ID: 2530313
- Journal Information:
- Nature Communications, Journal Name: Nature Communications Journal Issue: 1 Vol. 15; ISSN 2041-1723
- Publisher:
- Nature Publishing GroupCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Communities of microbial eukaryotes in the mammalian gut within the context of environmental eukaryotic diversity
Improvement of eukaryotic protein predictions from soil metagenomes
Journal Article
·
Thu Jun 19 00:00:00 EDT 2014
· Frontiers in Microbiology
·
OSTI ID:1392592
Improvement of eukaryotic protein predictions from soil metagenomes
Journal Article
·
Wed Jun 15 20:00:00 EDT 2022
· Scientific Data
·
OSTI ID:1904105