FUN-PROSE: A deep learning approach to predict condition-specific gene expression in fungi
- University of Illinois Urbana-Champaign, IL (United States); Carl R. Woese Institute for Genomic Biology, Urbana, IL (United States)
- University of Illinois Urbana-Champaign, IL (United States); Carl R. Woese Institute for Genomic Biology, Urbana, IL (United States); The Gladstone Institute of Data Science and Biotechnology, San Francisco, CA (United States)
- Carl R. Woese Institute for Genomic Biology, Urbana, IL (United States); University of Illinois Urbana-Champaign, IL (United States)
- University of Illinois Urbana-Champaign, IL (United States); Carl R. Woese Institute for Genomic Biology, Urbana, IL (United States); Argonne National Laboratory (ANL), Argonne, IL (United States)
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE—a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
- Research Organization:
- University of Illinois Urbana-Champaign, IL (United States); Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), Urbana, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER)
- Grant/Contract Number:
- SC0018420; AC02-06-CH11357
- OSTI ID:
- 2212841
- Alternate ID(s):
- OSTI ID: 2311189
- Journal Information:
- PLoS Computational Biology (Online), Vol. 19, Issue 11; ISSN 1553-7358
- Publisher:
- Public Library of ScienceCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
A primer on regression methods for decoding cis-regulatory logic
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network