Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Addressing the pervasive scarcity of structural annotation in eukaryotic algae

Journal Article · · Scientific Reports
Abstract

Despite a continuous increase in algal genome sequencing, structural annotations of most algal genome assemblies remain unavailable. This pervasive scarcity of genome annotation has restricted rigorous investigation of these genomic resources and may have precipitated misleading biological interpretations. However, the annotation process for eukaryotic algal species is often challenging as genomic resources and transcriptomic evidence are not always available. To address this challenge, we benchmark the cutting-edge gene prediction methods that can be generalized for a broad range of non-model eukaryotes. Using the most accurate methods selected based on high-quality algal genomes, we predict structural annotations for 135 unannotated algal genomes. Using previously available genomic data pooled together with new data obtained in this study, we identified the core orthologous genes and the multi-gene phylogeny of eukaryotic algae, including of previously unexplored algal species. This study not only provides a benchmark for the use of structural annotation methods on a variety of non-model eukaryotes, but also compensates for missing data in the current spectrum of algal genomic resources. These results bring us one step closer to the full potential of eukaryotic algal genomics.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE; USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
Grant/Contract Number:
89233218CNA000001
OSTI ID:
1922580
Alternate ID(s):
OSTI ID: 1924408
Report Number(s):
LA-UR-21-30119; 1687; PII: 27881
Journal Information:
Scientific Reports, Journal Name: Scientific Reports Journal Issue: 1 Vol. 13; ISSN 2045-2322
Publisher:
Nature Publishing GroupCopyright Statement
Country of Publication:
United Kingdom
Language:
English

References (46)

ggplot2: ggplot2 journal February 2011
Evaluation of Gene Structure Prediction Programs journal June 1996
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure journal November 2001
The Algal Tree of Life from a Genomics Perspective book January 2020
The state of algal genome quality and diversity journal September 2020
An evaluation of methodology to determine algal genome completeness journal October 2020
Gene Ontology: tool for the unification of biology journal May 2000
Pan genome of the phytoplankton Emiliania underpins its global distribution journal June 2013
Fast and sensitive protein alignment using DIAMOND journal November 2014
ModelFinder: fast model selection for accurate phylogenetic estimates journal May 2017
A beginner's guide to eukaryotic genome annotation journal April 2012
Primary and Secondary Endosymbiosis and the Origin of Plastids journal December 2001
A single origin of the peridinin- and fucoxanthin-containing plastids in dinoflagellates through tertiary endosymbiosis journal August 2002
Biopython: freely available Python tools for computational molecular biology and bioinformatics journal March 2009
InterProScan 5: genome-scale protein function classification journal January 2014
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs journal June 2015
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era journal February 2020
A Molecular Timeline for the Origin of Photosynthetic Eukaryotes journal May 2004
What Is the Role of Genome Duplication in the Evolution of Complexity and Diversity? journal February 2006
Phylogenomic Analysis Supports the Monophyly of Cryptophytes and Haptophytes and the Association of Rhizaria with Chromalveolates journal April 2007
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability journal January 2013
UFBoot2: Improving the Ultrafast Bootstrap Approximation journal October 2017
BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics journal December 2017
New Phylogenomic Analysis of the Enigmatic Phylum Telonemia Further Resolves the Eukaryote Tree of Life journal January 2019
GenBank journal November 2020
AUGUSTUS: ab initio prediction of alternative transcripts journal July 2006
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins journal January 2007
A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence journal March 2008
Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features journal July 2012
The genome portal of the Department of Energy Joint Genome Institute: 2014 updates journal November 2013
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm journal July 2014
PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools journal November 2018
OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs journal November 2018
Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees journal May 2019
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins journal May 2020
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training journal October 2008
Revisions to the Classification, Nomenclature, and Diversity of Eukaryotes journal September 2018
The Deep Roots of Eukaryotes journal June 2003
The Origin and Establishment of the Plastid in Algae and Plants journal December 2007
MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects journal December 2011
nGASP – the nematode genome annotation assessment project journal December 2008
Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes journal January 2012
FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences journal April 2021
A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms journal April 2020
Mash: fast genome and metagenome distance estimation using MinHash journal June 2016
The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens journal January 2019

Similar Records

Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes
Conference · Fri Apr 17 00:00:00 EDT 2009 · OSTI ID:957404

AlgaeOrtho, a bioinformatics tool for processing ortholog inference results in algae
Journal Article · Sun Mar 02 19:00:00 EST 2025 · Frontiers in Microbiology · OSTI ID:2558964