Predictive Models of Genetic Redundancy in Arabidopsis thaliana
Abstract
Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored themore »
- Authors:
-
- Cell and Molecular Biology Program, Michigan State University, East Lansing, MI, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
- Department of Botany, University of Wisconsin-Madison, Madison, WI, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA, Kellogg Biological Station, Michigan State University, East Lansing, MI, USA
- Department of Horticulture, University of Wisconsin-Madison, Madison, WI, USA
- Cell and Molecular Biology Program, Michigan State University, East Lansing, MI, USA, Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA
- Publication Date:
- Research Org.:
- Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF)
- OSTI Identifier:
- 1811204
- Alternate Identifier(s):
- OSTI ID: 1804107
- Grant/Contract Number:
- SC0018409; IOS-1546617; DEB-1655386
- Resource Type:
- Published Article
- Journal Name:
- Molecular Biology and Evolution (Online)
- Additional Journal Information:
- Journal Name: Molecular Biology and Evolution (Online) Journal Volume: 38 Journal Issue: 8; Journal ID: ISSN 1537-1719
- Publisher:
- Oxford University Press
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; genetic redundancy; molecular evolution; machine learning
Citation Formats
Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., and Shiu, Shin-Han. Predictive Models of Genetic Redundancy in Arabidopsis thaliana. United States: N. p., 2021.
Web. doi:10.1093/molbev/msab111.
Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., & Shiu, Shin-Han. Predictive Models of Genetic Redundancy in Arabidopsis thaliana. United States. https://doi.org/10.1093/molbev/msab111
Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., and Shiu, Shin-Han. Mon .
"Predictive Models of Genetic Redundancy in Arabidopsis thaliana". United States. https://doi.org/10.1093/molbev/msab111.
@article{osti_1811204,
title = {Predictive Models of Genetic Redundancy in Arabidopsis thaliana},
author = {Cusack, Siobhan A. and Wang, Peipei and Lotreck, Serena G. and Moore, Bethany M. and Meng, Fanrui and Conner, Jeffrey K. and Krysan, Patrick J. and Lehti-Shiu, Melissa D. and Shiu, Shin-Han},
abstractNote = {Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.},
doi = {10.1093/molbev/msab111},
journal = {Molecular Biology and Evolution (Online)},
number = 8,
volume = 38,
place = {United States},
year = {Mon Apr 19 00:00:00 EDT 2021},
month = {Mon Apr 19 00:00:00 EDT 2021}
}
https://doi.org/10.1093/molbev/msab111
Works referenced in this record:
Diversity, classification and function of the plant protein kinase superfamily
journal, September 2012
- Lehti-Shiu, Melissa D.; Shiu, Shin-Han
- Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1602
PPDB, the Plant Proteomics Database at Cornell
journal, October 2008
- Sun, Qi; Zybailov, Boris; Majeran, Wojciech
- Nucleic Acids Research, Vol. 37, Issue suppl_1
Individual Comparisons by Ranking Methods
journal, December 1945
- Wilcoxon, Frank
- Biometrics Bulletin, Vol. 1, Issue 6
A Comprehensive Dataset of Genes with a Loss-of-Function Mutant Phenotype in Arabidopsis
journal, January 2012
- Lloyd, Johnny; Meinke, David
- Plant Physiology, Vol. 158, Issue 3
Mapping and Dynamics of Regulatory DNA and Transcription Factor Networks in A. thaliana
journal, September 2014
- Sullivan, Alessandra M.; Arsovski, Andrej A.; Lempe, Janne
- Cell Reports, Vol. 8, Issue 6
The Plant Genome Integrative Explorer Resource: PlantGenIE.org
journal, July 2015
- Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu
- New Phytologist, Vol. 208, Issue 4
Escape from adaptive conflict after duplication in an anthocyanin pathway gene
journal, June 2008
- Des Marais, David L.; Rausher, Mark D.
- Nature, Vol. 454, Issue 7205
Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution
journal, June 2004
- Blanc, Guillaume; Wolfe, Kenneth H.
- The Plant Cell, Vol. 16, Issue 7
Primitive Genetic Mechanisms and the Origin of Chromosomes
journal, July 1960
- Gabriel, Mordecai L.
- The American Naturalist, Vol. 94, Issue 877
Fitness effects of mutation: testing genetic redundancy in Arabidopsis thaliana
journal, May 2017
- Rutter, M. T.; Wieckowski, Y. M.; Murren, C. J.
- Journal of Evolutionary Biology, Vol. 30, Issue 6
Widespread conservation of genetic redundancy during a billion years of eukaryotic evolution
journal, October 2008
- Vavouri, Tanya; Semple, Jennifer I.; Lehner, Ben
- Trends in Genetics, Vol. 24, Issue 10
The butterfly plant arms-race escalated by gene and genome duplications
journal, June 2015
- Edger, Patrick P.; Heidel-Fischer, Hanna M.; Bekaert, Michaël
- Proceedings of the National Academy of Sciences, Vol. 112, Issue 27
Molecular basis of the cauliflower phenotype in Arabidopsis
journal, January 1995
- Kempin, S.; Savidge, B.; Yanofsky, M.
- Science, Vol. 267, Issue 5197
AraCyc: A Biochemical Pathway Database for Arabidopsis
journal, June 2003
- Mueller, Lukas A.; Zhang, Peifen; Rhee, Seung Y.
- Plant Physiology, Vol. 132, Issue 2
NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
journal, January 2007
- Pruitt, K. D.; Tatusova, T.; Maglott, D. R.
- Nucleic Acids Research, Vol. 35, Issue Database
Gene Ontology: tool for the unification of biology
journal, May 2000
- Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
- Nature Genetics, Vol. 25, Issue 1
Phytozome: a comparative platform for green plant genomics
journal, November 2011
- Goodstein, David M.; Shu, Shengqiang; Howson, Russell
- Nucleic Acids Research, Vol. 40, Issue D1
The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses: AtGenExpress global abiotic stress data set
journal, March 2007
- Kilian, Joachim; Whitehead, Dion; Horak, Jakub
- The Plant Journal, Vol. 50, Issue 2
The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome: Tair: Making and Mining the “Gold Standard” Plant Genome
journal, August 2015
- Berardini, Tanya Z.; Reiser, Leonore; Li, Donghui
- genesis, Vol. 53, Issue 8
The AtGenExpress hormone and chemical treatment data set: experimental design, data evaluation, model data analysis and data access
journal, August 2008
- Goda, Hideki; Sasaki, Eriko; Akiyama, Kenji
- The Plant Journal, Vol. 55, Issue 3
Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis
journal, October 2016
- Hsu, Polly Yingshan; Calviello, Lorenzo; Wu, Hsin-Yen Larry
- Proceedings of the National Academy of Sciences, Vol. 113, Issue 45
Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events
journal, March 2003
- Bowers, John E.; Chapman, Brad A.; Rong, Junkang
- Nature, Vol. 422, Issue 6930
Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants
journal, February 2013
- Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua
- Plant Physiology, Vol. 161, Issue 4
Arabidopsis gene knockout: phenotypes wanted
journal, April 2001
- Bouché, N.
- Current Opinion in Plant Biology, Vol. 4, Issue 2
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
journal, January 2014
- Stamatakis, Alexandros
- Bioinformatics, Vol. 30, Issue 9
Evolution of genetic redundancy
journal, July 1997
- Nowak, Martin A.; Boerlijst, Maarten C.; Cooke, Jonathan
- Nature, Vol. 388, Issue 6638
Evolution of gene duplication in plants
journal, June 2016
- Panchy, Nicholas; Lehti-Shiu, Melissa D.; Shiu, Shin-Han
- Plant Physiology
The Pfam protein families database: towards a more sustainable future
journal, December 2015
- Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.
- Nucleic Acids Research, Vol. 44, Issue D1
Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity
journal, July 2006
- Freeling, M.
- Genome Research, Vol. 16, Issue 7
Identification of metagenes and their Interactions through Large-scale Analysis of Arabidopsis Gene Expression Data
journal, January 2012
- Wilson, Tyler J.; Lai, Liming; Ban, Yuguang
- BMC Genomics, Vol. 13, Issue 1
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
journal, May 2012
- Dormann, Carsten F.; Elith, Jane; Bacher, Sven
- Ecography, Vol. 36, Issue 1
Analysis Tool Web Services from the EMBL-EBI
journal, May 2013
- McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut
- Nucleic Acids Research, Vol. 41, Issue W1
GABI-DUPLO: a collection of double mutants to overcome genetic redundancy in Arabidopsis thaliana
journal, May 2013
- Bolle, Cordelia; Huep, Gunnar; Kleinbölting, Nils
- The Plant Journal, Vol. 75, Issue 1
Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli
journal, August 2008
- Hanada, Kousuke; Zou, Cheng; Lehti-Shiu, Melissa D.
- Plant Physiology, Vol. 148, Issue 2
Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana
journal, January 2010
- Lee, Insuk; Ambaru, Bindu; Thakkar, Pranjali
- Nature Biotechnology, Vol. 28, Issue 2
Following Gene Duplication, Paralog Interference Constrains Transcriptional Circuit Evolution
journal, October 2013
- Baker, Christopher R.; Hanson-Smith, Victor; Johnson, Alexander D.
- Science, Vol. 342, Issue 6154
Basic local alignment search tool
journal, October 1990
- Altschul, Stephen F.; Gish, Warren; Miller, Webb
- Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
Unequal genetic redundancies in Arabidopsis – a neglected phenomenon?
journal, October 2006
- Briggs, G.; Osmont, K.; Shindo, C.
- Trends in Plant Science, Vol. 11, Issue 10
Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes
journal, August 2015
- Lloyd, John P.; Seddon, Alexander E.; Moghe, Gaurav D.
- The Plant Cell, Vol. 27, Issue 8
Robust predictions of specialized metabolism genes through machine learning
journal, January 2019
- Moore, Bethany M.; Wang, Peipei; Fan, Pengxiang
- Proceedings of the National Academy of Sciences, Vol. 116, Issue 6
MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans
journal, March 2013
- Wang, Yupeng; Li, Jingping; Paterson, Andrew H.
- Bioinformatics, Vol. 29, Issue 11
Modeling gene and genome duplications in eukaryotes
journal, March 2005
- Maere, S.; De Bodt, S.; Raes, J.
- Proceedings of the National Academy of Sciences, Vol. 102, Issue 15
Predicting genome-wide redundancy using machine learning
journal, January 2010
- Chen, Huang-Wen; Bandyopadhyay, Sunayan; Shasha, Dennis E.
- BMC Evolutionary Biology, Vol. 10, Issue 1
The MyoD family and myogenesis: Redundancy, networks, and thresholds
journal, December 1993
- Weintraub, Harold
- Cell, Vol. 75, Issue 7
MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004
- Edgar, R. C.
- Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
The Diurnal Project: Diurnal and Circadian Expression Profiling, Model-based Pattern Matching, and Promoter Analysis
journal, January 2007
- Mockler, T. C.; Michael, T. P.; Priest, H. D.
- Cold Spring Harbor Symposia on Quantitative Biology, Vol. 72, Issue 1
A gene expression map of Arabidopsis thaliana development
journal, April 2005
- Schmid, Markus; Davison, Timothy S.; Henz, Stefan R.
- Nature Genetics, Vol. 37, Issue 5
PAML 4: Phylogenetic Analysis by Maximum Likelihood
journal, April 2007
- Yang, Z.
- Molecular Biology and Evolution, Vol. 24, Issue 8
Preservation of Duplicate Genes by Complementary, Degenerative Mutations
journal, April 1999
- Force, Allan; Lynch, Michael; Pickett, F. Bryan
- Genetics, Vol. 151, Issue 4
AtPIN: Arabidopsis thaliana Protein Interaction Network
journal, December 2009
- Brandão, Marcelo M.; Dantas, Luiza L.; Silva-Filho, Marcio C.
- BMC Bioinformatics, Vol. 10, Issue 1
Can genes be truly redundant?
journal, October 1992
- Brookfield, John
- Current Biology, Vol. 2, Issue 10