Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Machine learning enables identification of an alternative yeast galactose utilization pathway

Journal Article · · Proceedings of the National Academy of Sciences of the United States of America
 [1];  [2];  [3];  [4];  [2];  [5];  [6];  [7];  [2];  [8]
  1. Vanderbilt University, Nashville, TN (United States); Vanderbilt University Department of Biological Sciences and Evolutionary Studies Initiative
  2. University of Wisconsin-Madison, WI (United States)
  3. Vanderbilt University, Nashville, TN (United States); University of North Carolina at Charlotte, NC (United States)
  4. University of Wisconsin-Madison, WI (United States); Villanova University, PA (United States)
  5. South China Agricultural University, Guangzhou (China)
  6. Zhejiang University, Hangzhou (China)
  7. Westerdijk Fungal Biodiversity Institute, Utrecht (The Netherlands)
  8. Vanderbilt University, Nashville, TN (United States)
How genomic differences contribute to phenotypic differences is a major question in biology. The recently characterized genomes, isolation environments, and qualitative patterns of growth on 122 sources and conditions of 1,154 strains from 1,049 fungal species (nearly all known) in the yeast subphylum Saccharomycotina provide a powerful, yet complex, dataset for addressing this question. We used a random forest algorithm trained on these genomic, metabolic, and environmental data to predict growth on several carbon sources with high accuracy. Known structural genes involved in assimilation of these sources and presence/absence patterns of growth in other sources were important features contributing to prediction accuracy. By further examining growth on galactose, we found that it can be predicted with high accuracy from either genomic (92.2%) or growth data (82.6%) but not from isolation environment data (65.6%). Prediction accuracy was even higher (93.3%) when we combined genomic and growth data. After the GALactose utilization genes, the most important feature for predicting growth on galactose was growth on galactitol, raising the hypothesis that several species in two orders, Serinales and Pichiales (containing the emerging pathogen Candida auris and the genus Ogataea, respectively), have an alternative galactose utilization pathway because they lack the GAL genes. Growth and biochemical assays confirmed that several of these species utilize galactose through an alternative oxidoreductive D-galactose pathway, rather than the canonical GAL pathway. Machine learning approaches are powerful for investigating the evolution of the yeast genotype–phenotype map, and their application will uncover novel biology, even in well-studied traits.
Research Organization:
Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
SC0018409
OSTI ID:
2370376
Journal Information:
Proceedings of the National Academy of Sciences of the United States of America, Journal Name: Proceedings of the National Academy of Sciences of the United States of America Journal Issue: 18 Vol. 121; ISSN 0027-8424
Publisher:
National Academy of SciencesCopyright Statement
Country of Publication:
United States
Language:
English

References (61)

Candida albicans SOU1 encodes a sorbose reductase required forL-sorbose utilization journal January 2005
Giant GAL gene clusters for the melibiose‐galactose pathway in Torulaspora journal December 2020
ggplot2 book January 2016
Fruits and vegetables are a source of galactose: Implications in planning the diets of patients with Galactosaemia journal July 1990
Hidden sources of galactose in the environment journal February 1995
The alternative d-galactose degrading pathway of Aspergillus nidulans proceeds via l-sorbose journal January 2004
Fungal arabinan and l-arabinose metabolism journal January 2011
A deep convolutional neural network approach for predicting phenotypes from genotypes journal August 2018
A random forest guided tour journal April 2016
Getting started with yeast book January 2002
Sugar and polyol compositions of some European olive fruit varieties (Olea europaea L.) suitable for table olive purposes journal March 2001
Enzymatic synthesis of 2-keto-d-gluconate and 2-keto-d-galactonate from d-glucose and d-galactose with cell culture of Pseudomonas fluorescens and 2-keto-galactonate from d-galactono 1,4-lactone with partially purified 2-ketogalactonate reductase journal September 2003
Neoadjuvant anastrozole versus tamoxifen in patients receiving goserelin for premenopausal breast cancer (STAGE): a double-blind, randomised phase 3 trial journal April 2012
Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum journal November 2018
Diverse yeasts for diverse fermented beverages and foods journal February 2018
Specialized Sugar Sensing in Diverse Fungi journal March 2009
Rapid Expansion and Functional Divergence of Subtelomeric Gene Families in Yeasts journal May 2010
Detailed analysis of the D-galactose catabolic pathways in Aspergillus niger reveals complexity at both metabolic and regulatory level journal April 2022
Genomics and the making of yeast biodiversity journal December 2015
Tracking alternative versions of the galactose gene network in the genus Saccharomyces and their expansion after domestication journal February 2024
New insights into galactose metabolism by Schizosaccharomyces pombe: Isolation and characterization of a galactose-assimilating mutant journal February 2011
101 Dothideomycetes genomes: A test case for predicting lifestyles and emergence of pathogens journal June 2020
The evolution of the GALactose utilization pathway in budding yeasts journal January 2022
Phylogenetics is the New Genetics (for Most of Biodiversity) journal May 2020
Random forests for genomic data analysis journal June 2012
Crabtree/Warburg-like aerobic xylose fermentation by engineered Saccharomyces cerevisiae journal November 2021
A Machine Learning Bioinformatics Method to Predict Biological Activity from Biosynthetic Gene Clusters journal May 2021
Remarkably ancient balanced polymorphisms in a multi-locus gene network journal February 2010
Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure journal December 2020
A primer on deep learning in genomics journal November 2018
Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts journal September 2004
Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi journal May 2010
Comparative genomics of xylose-fermenting fungi for enhanced biofuel production journal July 2011
Comparative genomics of biotechnologically important yeasts journal August 2016
Robust predictions of specialized metabolism genes through machine learning journal January 2019
R: A Language for Data Analysis and Graphics journal September 1996
InterProScan 5: genome-scale protein function classification journal January 2014
The future of fungi: threats and opportunities journal September 2022
Repeated horizontal gene transfer of GAL actose metabolism genes violates Dollo’s law of irreversible loss journal November 2020
Codon Optimization Improves the Prediction of Xylose Metabolism from Gene Content in Budding Yeasts journal May 2023
Repeated Cis-Regulatory Tuning of a Metabolic Bottleneck Gene during Evolution journal May 2018
KEGG: Kyoto Encyclopedia of Genes and Genomes journal January 2000
KEGG for taxonomy-based analysis of pathways and genomes journal October 2022
Characterization of two different types of UDP-glucose/-galactose4-epimerase involved in galactosylation in fission yeast journal March 2010
GalR, GalX and AraR co‐regulate d‐galactose and l‐arabinose utilization in Aspergillus nidulans journal February 2022
Ancient balancing selection maintains incompatible versions of the galactose pathway in yeast journal January 2021
Decoupling transcription factor expression and activity enables dimmer switch gene regulation journal April 2021
Genomic factors shape carbon and nitrogen metabolic niche breadth across Saccharomycotina yeasts journal April 2024
Complex Physiology and Compound Stress Responses during Fermentation of Alkali-Pretreated Corn Stover Hydrolysate by an Escherichia coli Ethanologen journal March 2012
Metabolic Engineering of Saccharomyces cerevisiae journal March 2000
Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning journal February 2024
XGBoost: A Scalable Tree Boosting System conference January 2016
Factors driving metabolic diversity in the budding yeast subphylum journal March 2018
Exploring xylose metabolism in Spathaspora species: XYL1.2 from Spathaspora passalidarum as the key for efficient anaerobic xylose fermentation in metabolic engineered Saccharomyces cerevisiae journal August 2016
Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure journal December 2009
Engineering and Two-Stage Evolution of a Lignocellulosic Hydrolysate-Tolerant Saccharomyces cerevisiae Strain for Anaerobic Fermentation of Xylose from AFEX Pretreated Corn Stover journal September 2014
A genome-informed higher rank classification of the biotechnologically important fungal subphylum Saccharomycotina journal June 2023
Two-Stage Semi-Continuous 2-Keto-Gluconic Acid (2KGA) Production by Pseudomonas plecoglossicida JUIM01 From Rice Starch Hydrolyzate journal February 2020
New kids on the block: emerging oleaginous yeast of biotechnological importance journal January 2017
Transcriptional rewiring over evolutionary timescales changes quantitative and qualitative properties of gene expression journal September 2016
Recognition of galactose by a scaffold protein recruits a transcriptional activator for the GAL regulon induction in Candida albicans journal February 2023

Figures / Tables (7)