skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data

Abstract

Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. Furthermore, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine,more » instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.« less

Authors:
 [1];  [1];  [2];  [3];  [2];  [1]
  1. Vanderbilt Univ., Nashville, TN (United States)
  2. Univ. of Wisconsin, Madison, WI (United States)
  3. U.S. Dept. of Agriculture, Peoria, IL (United States)
Publication Date:
Research Org.:
Univ. of Wisconsin-Madison, Madison, WI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
OSTI Identifier:
1361388
Grant/Contract Number:
FC02-07ER64494
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
G3
Additional Journal Information:
Journal Volume: 6; Journal Issue: 12; Journal ID: ISSN 2160-1836
Publisher:
Genetics Society of America
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 59 BASIC BIOLOGICAL SCIENCES; phylogenomics; maximum likelihood; incongruence; genome completeness; nuclear markers

Citation Formats

Shen, Xing -Xing, Zhou, Xiaofan, Kominek, Jacek, Kurtzman, Cletus P., Hittinger, Chris Todd, and Rokas, Antonis. Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data. United States: N. p., 2016. Web. doi:10.1534/g3.116.034744.
Shen, Xing -Xing, Zhou, Xiaofan, Kominek, Jacek, Kurtzman, Cletus P., Hittinger, Chris Todd, & Rokas, Antonis. Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data. United States. doi:10.1534/g3.116.034744.
Shen, Xing -Xing, Zhou, Xiaofan, Kominek, Jacek, Kurtzman, Cletus P., Hittinger, Chris Todd, and Rokas, Antonis. 2016. "Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data". United States. doi:10.1534/g3.116.034744. https://www.osti.gov/servlets/purl/1361388.
@article{osti_1361388,
title = {Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data},
author = {Shen, Xing -Xing and Zhou, Xiaofan and Kominek, Jacek and Kurtzman, Cletus P. and Hittinger, Chris Todd and Rokas, Antonis},
abstractNote = {Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. Furthermore, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.},
doi = {10.1534/g3.116.034744},
journal = {G3},
number = 12,
volume = 6,
place = {United States},
year = 2016,
month = 9
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1work
Citation information provided by
Web of Science

Save / Share:
  • Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making themmore » highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.« less
  • Biomass and shotgun global proteomics data that reflected relative protein abundances from samples collected during the 2008 experiment at the U.S. Department of Energy Integrated Field-Scale Subsurface Research Challenge site in Rifle, Colorado, provided an unprecedented opportunity to validate a genome-scale metabolic model of Geobacter metallireducens and assess its performance with respect to prediction of metal reduction, biomass yield, and growth rate under dynamic field conditions. Reconstructed from annotated genomic sequence, biochemical, and physiological data, the constraint-based in silico model of G. metallireducens relates an annotated genome sequence to the physiological functions with 697 reactions controlled by 747 enzyme-coding genes.more » Proteomic analysis showed that 180 of the 637 G. metallireducens proteins detected during the 2008 experiment were associated with specific metabolic reactions in the in silico model. When the field-calibrated Fe(III) terminal electron acceptor process reaction in a reactive transport model for the field experiments was replaced with the genome-scale model, the model predicted that the largest metabolic fluxes through the in silico model reactions generally correspond to the highest abundances of proteins that catalyze those reactions. Central metabolism predicted by the model agrees well with protein abundance profiles inferred from proteomic analysis. Model discrepancies with the proteomic data, such as the relatively low fluxes through amino acid transport and metabolism, revealed pathways or flux constraints in the in silico model that could be updated to more accurately predict metabolic processes that occur in the subsurface environment.« less
  • Integrating omics data to refine or make context-specific models is an active field of constraint-based modeling. Proteomics now cover over 95% of the Escherichia coli proteome by mass. Genome-scale models of Metabolism and macromolecular Expression (ME) compute proteome allocation linked to metabolism and fitness. Using proteomics data, we formulated allocation constraints for key proteome sectors in the ME model. The resulting calibrated model effectively computed the “generalist” (wild-type) E. coli proteome and phenotype across diverse growth environments. Across 15 growth conditions, prediction errors for growth rate and metabolic fluxes were 69% and 14% lower, respectively. The sector-constrained ME model thusmore » represents a generalist ME model reflecting both growth rate maximization and “hedging” against uncertain environments and stresses, as indicated by significant enrichment of these sectors for the general stress response sigma factor σS. Finally, the sector constraints represent a general formalism for integrating omics data from any experimental condition into constraint-based ME models. The constraints can be fine-grained (individual proteins) or coarse-grained (functionally-related protein groups) as demonstrated here. Furthermore, this flexible formalism provides an accessible approach for narrowing the gap between the complexity captured by omics data and governing principles of proteome allocation described by systems-level models.« less
  • As part of an ongoing project to generate a mitochondrial database for terrestrial tortoises based on museum specimens, the complete mitochondrial genome sequences of 10 species and a {approx}14 kb sequence from an eleventh species are reported. The sampling of the present study emphasizes Mediterranean tortoises (genus Testudo and their close relatives). Our new sequences are aligned, along with those of two testudinoid turtles from GenBank, Chrysemys picta and Mauremys reevesii, yielding an alignment of 14,858 positions, of which 3,238 are parsimony informative. We develop a phylogenetic taxonomy for Testudo and related species based on well-supported, diagnosable clades. Several well-supportedmore » nodes are recovered, including the monophyly of a restricted Testudo, T. kleinmanni + T. marginata (the Chersus clade), and the placement of the enigmatic African pancake tortoise (Malacochersustornieri) within the predominantly Palearctic greater Testudo group (Testudona tax. nov.). Despite the large amount of sequence reported, there is low statistical support for some nodes within Testudona and Sowe do not propose names for those groups. A preliminary and conservative estimation of divergence times implies a late Miocene diversification for the testudonan clade (6-12 million years ago), matching their first appearance in the fossil record. The multi-continental distribution of testudonan turtles can be explained by the establishment of permanent connections between Europe, Africa, and Asia at this time. The arrival of testudonan turtles to Africa occurred after one or more initial tortoise invasions gave rise to the diverse (>25 species) 'Geochelone complex.'Two unusual genomic features are reported for the mtDNA of one tortoise, M. tornieri: (1) nad4 has a shift of reading frame that we suggest is resolved by translational frameshifting of the mRNA on the ribosome during protein synthesis and (2) there are two copies of the control region and trnF, with the latter having experienced multiple nucleotide substitutions in a pattern suggesting that each is being maintained by selection.« less
  • The magnoliids represent the largest basal angiosperm clade with four orders, 19 families and 8,500 species. Although several recent angiosperm molecular phylogenies have supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence has resulted in phylogenies supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. This is one of the most important remaining issues concerning relationships among basal angiosperms. We sequenced the chloroplast genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales),more » and Piper (Piperales), and used these data in combination with 32 other completed angiosperm chloroplast genomes to assess phylogenetic relationships among magnoliids. The Drimys and Piper chloroplast genomes are nearly identical in size at 160,606 and 160,624 bp, respectively. The genomes include a pair of inverted repeats of 26,649 bp (Drimys) and 27,039 (Piper), separated by a small single copy region of 18,621 (Drimys) and 18,878 (Piper) and a large single copy region of 88,685 bp (Drimys) and 87,666 bp (Piper). The gene order of both taxa is nearly identical to many other unrearranged angiosperm chloroplast genomes, including Calycanthus, the other published magnoliid genome. Comparisons of angiosperm chloroplast genomes indicate that GC content is not uniformly distributed across the genome. Overall GC content ranges from 34-39%, and coding regions have a substantially higher GC content than non-coding regions (both intergenic spacers and introns). Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Across the genome, GC content is highest in the inverted repeat due to the presence of rRNA genes and lowest in the small single copy region where most NADH genes are located. Phylogenetic analyses using maximum parsimony and maximum likelihood methods were performed on DNA sequences of 61 protein-coding genes. Trees from both analyses provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. The phylogenies also provided moderate to strong support for the basal position of Amborella, and a sister relationship of magnoliids to a clade that includes monocots and eudicots. The complete sequences of three magnoliid chloroplast genomes provide new data from the largest basal angiosperm clade. Evolutionary comparisons of these new genome sequences, combined with other published angiosperm genome, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.« less