DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Predictive Models of Genetic Redundancy in Arabidopsis thaliana

Abstract

Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored themore » potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.« less

Authors:
 [1];  [2];  [3]; ORCiD logo [4];  [2];  [5];  [6];  [2]; ORCiD logo [7]
  1. Cell and Molecular Biology Program, Michigan State University, East Lansing, MI, USA
  2. Department of Plant Biology, Michigan State University, East Lansing, MI, USA
  3. Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
  4. Department of Botany, University of Wisconsin-Madison, Madison, WI, USA
  5. Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA, Kellogg Biological Station, Michigan State University, East Lansing, MI, USA
  6. Department of Horticulture, University of Wisconsin-Madison, Madison, WI, USA
  7. Cell and Molecular Biology Program, Michigan State University, East Lansing, MI, USA, Department of Plant Biology, Michigan State University, East Lansing, MI, USA, Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI, USA
Publication Date:
Research Org.:
Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER); National Science Foundation (NSF)
OSTI Identifier:
1811204
Alternate Identifier(s):
OSTI ID: 1804107
Grant/Contract Number:  
SC0018409; IOS-1546617; DEB-1655386
Resource Type:
Published Article
Journal Name:
Molecular Biology and Evolution (Online)
Additional Journal Information:
Journal Name: Molecular Biology and Evolution (Online) Journal Volume: 38 Journal Issue: 8; Journal ID: ISSN 1537-1719
Publisher:
Oxford University Press
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; genetic redundancy; molecular evolution; machine learning

Citation Formats

Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., and Shiu, Shin-Han. Predictive Models of Genetic Redundancy in Arabidopsis thaliana. United States: N. p., 2021. Web. doi:10.1093/molbev/msab111.
Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., & Shiu, Shin-Han. Predictive Models of Genetic Redundancy in Arabidopsis thaliana. United States. https://doi.org/10.1093/molbev/msab111
Cusack, Siobhan A., Wang, Peipei, Lotreck, Serena G., Moore, Bethany M., Meng, Fanrui, Conner, Jeffrey K., Krysan, Patrick J., Lehti-Shiu, Melissa D., and Shiu, Shin-Han. Mon . "Predictive Models of Genetic Redundancy in Arabidopsis thaliana". United States. https://doi.org/10.1093/molbev/msab111.
@article{osti_1811204,
title = {Predictive Models of Genetic Redundancy in Arabidopsis thaliana},
author = {Cusack, Siobhan A. and Wang, Peipei and Lotreck, Serena G. and Moore, Bethany M. and Meng, Fanrui and Conner, Jeffrey K. and Krysan, Patrick J. and Lehti-Shiu, Melissa D. and Shiu, Shin-Han},
abstractNote = {Abstract Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.},
doi = {10.1093/molbev/msab111},
journal = {Molecular Biology and Evolution (Online)},
number = 8,
volume = 38,
place = {United States},
year = {Mon Apr 19 00:00:00 EDT 2021},
month = {Mon Apr 19 00:00:00 EDT 2021}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
https://doi.org/10.1093/molbev/msab111

Save / Share:

Works referenced in this record:

Diversity, classification and function of the plant protein kinase superfamily
journal, September 2012

  • Lehti-Shiu, Melissa D.; Shiu, Shin-Han
  • Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 367, Issue 1602
  • DOI: 10.1098/rstb.2012.0003

PPDB, the Plant Proteomics Database at Cornell
journal, October 2008

  • Sun, Qi; Zybailov, Boris; Majeran, Wojciech
  • Nucleic Acids Research, Vol. 37, Issue suppl_1
  • DOI: 10.1093/nar/gkn654

Evolution by Gene Duplication
book, January 1970


Individual Comparisons by Ranking Methods
journal, December 1945

  • Wilcoxon, Frank
  • Biometrics Bulletin, Vol. 1, Issue 6
  • DOI: 10.2307/3001968

A Comprehensive Dataset of Genes with a Loss-of-Function Mutant Phenotype in Arabidopsis
journal, January 2012


Mapping and Dynamics of Regulatory DNA and Transcription Factor Networks in A. thaliana
journal, September 2014


The Plant Genome Integrative Explorer Resource: PlantGenIE.org
journal, July 2015

  • Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu
  • New Phytologist, Vol. 208, Issue 4
  • DOI: 10.1111/nph.13557

Escape from adaptive conflict after duplication in an anthocyanin pathway gene
journal, June 2008


Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution
journal, June 2004

  • Blanc, Guillaume; Wolfe, Kenneth H.
  • The Plant Cell, Vol. 16, Issue 7
  • DOI: 10.1105/tpc.021410

Primitive Genetic Mechanisms and the Origin of Chromosomes
journal, July 1960

  • Gabriel, Mordecai L.
  • The American Naturalist, Vol. 94, Issue 877
  • DOI: 10.1086/282127

Fitness effects of mutation: testing genetic redundancy in Arabidopsis thaliana
journal, May 2017

  • Rutter, M. T.; Wieckowski, Y. M.; Murren, C. J.
  • Journal of Evolutionary Biology, Vol. 30, Issue 6
  • DOI: 10.1111/jeb.13081

Widespread conservation of genetic redundancy during a billion years of eukaryotic evolution
journal, October 2008


The butterfly plant arms-race escalated by gene and genome duplications
journal, June 2015

  • Edger, Patrick P.; Heidel-Fischer, Hanna M.; Bekaert, Michaël
  • Proceedings of the National Academy of Sciences, Vol. 112, Issue 27
  • DOI: 10.1073/pnas.1503926112

Molecular basis of the cauliflower phenotype in Arabidopsis
journal, January 1995


AraCyc: A Biochemical Pathway Database for Arabidopsis
journal, June 2003

  • Mueller, Lukas A.; Zhang, Peifen; Rhee, Seung Y.
  • Plant Physiology, Vol. 132, Issue 2
  • DOI: 10.1104/pp.102.017236

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
journal, January 2007

  • Pruitt, K. D.; Tatusova, T.; Maglott, D. R.
  • Nucleic Acids Research, Vol. 35, Issue Database
  • DOI: 10.1093/nar/gkl842

Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

Phytozome: a comparative platform for green plant genomics
journal, November 2011

  • Goodstein, David M.; Shu, Shengqiang; Howson, Russell
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr944

The AtGenExpress hormone and chemical treatment data set: experimental design, data evaluation, model data analysis and data access
journal, August 2008


Super-resolution ribosome profiling reveals unannotated translation events in Arabidopsis
journal, October 2016

  • Hsu, Polly Yingshan; Calviello, Lorenzo; Wu, Hsin-Yen Larry
  • Proceedings of the National Academy of Sciences, Vol. 113, Issue 45
  • DOI: 10.1073/pnas.1614788113

Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events
journal, March 2003

  • Bowers, John E.; Chapman, Brad A.; Rong, Junkang
  • Nature, Vol. 422, Issue 6930
  • DOI: 10.1038/nature01521

Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants
journal, February 2013

  • Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua
  • Plant Physiology, Vol. 161, Issue 4
  • DOI: 10.1104/pp.112.200147

Arabidopsis gene knockout: phenotypes wanted
journal, April 2001


RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
journal, January 2014


Evolution of genetic redundancy
journal, July 1997

  • Nowak, Martin A.; Boerlijst, Maarten C.; Cooke, Jonathan
  • Nature, Vol. 388, Issue 6638
  • DOI: 10.1038/40618

Evolution of gene duplication in plants
journal, June 2016

  • Panchy, Nicholas; Lehti-Shiu, Melissa D.; Shiu, Shin-Han
  • Plant Physiology
  • DOI: 10.1104/pp.16.00523

The Pfam protein families database: towards a more sustainable future
journal, December 2015

  • Finn, Robert D.; Coggill, Penelope; Eberhardt, Ruth Y.
  • Nucleic Acids Research, Vol. 44, Issue D1
  • DOI: 10.1093/nar/gkv1344

Identification of metagenes and their Interactions through Large-scale Analysis of Arabidopsis Gene Expression Data
journal, January 2012


Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
journal, May 2012


Analysis Tool Web Services from the EMBL-EBI
journal, May 2013

  • McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut
  • Nucleic Acids Research, Vol. 41, Issue W1
  • DOI: 10.1093/nar/gkt376

GABI-DUPLO: a collection of double mutants to overcome genetic redundancy in Arabidopsis thaliana
journal, May 2013

  • Bolle, Cordelia; Huep, Gunnar; Kleinbölting, Nils
  • The Plant Journal, Vol. 75, Issue 1
  • DOI: 10.1111/tpj.12197

Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli
journal, August 2008

  • Hanada, Kousuke; Zou, Cheng; Lehti-Shiu, Melissa D.
  • Plant Physiology, Vol. 148, Issue 2
  • DOI: 10.1104/pp.108.122457

Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana
journal, January 2010

  • Lee, Insuk; Ambaru, Bindu; Thakkar, Pranjali
  • Nature Biotechnology, Vol. 28, Issue 2
  • DOI: 10.1038/nbt.1603

Following Gene Duplication, Paralog Interference Constrains Transcriptional Circuit Evolution
journal, October 2013

  • Baker, Christopher R.; Hanson-Smith, Victor; Johnson, Alexander D.
  • Science, Vol. 342, Issue 6154
  • DOI: 10.1126/science.1240810

Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Unequal genetic redundancies in Arabidopsis – a neglected phenomenon?
journal, October 2006


Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes
journal, August 2015

  • Lloyd, John P.; Seddon, Alexander E.; Moghe, Gaurav D.
  • The Plant Cell, Vol. 27, Issue 8
  • DOI: 10.1105/tpc.15.00051

Robust predictions of specialized metabolism genes through machine learning
journal, January 2019

  • Moore, Bethany M.; Wang, Peipei; Fan, Pengxiang
  • Proceedings of the National Academy of Sciences, Vol. 116, Issue 6
  • DOI: 10.1073/pnas.1817074116

MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans
journal, March 2013


Modeling gene and genome duplications in eukaryotes
journal, March 2005

  • Maere, S.; De Bodt, S.; Raes, J.
  • Proceedings of the National Academy of Sciences, Vol. 102, Issue 15
  • DOI: 10.1073/pnas.0501102102

Predicting genome-wide redundancy using machine learning
journal, January 2010

  • Chen, Huang-Wen; Bandyopadhyay, Sunayan; Shasha, Dennis E.
  • BMC Evolutionary Biology, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2148-10-357

The MyoD family and myogenesis: Redundancy, networks, and thresholds
journal, December 1993


MUSCLE: multiple sequence alignment with high accuracy and high throughput
journal, March 2004

  • Edgar, R. C.
  • Nucleic Acids Research, Vol. 32, Issue 5, p. 1792-1797
  • DOI: 10.1093/nar/gkh340

The Diurnal Project: Diurnal and Circadian Expression Profiling, Model-based Pattern Matching, and Promoter Analysis
journal, January 2007

  • Mockler, T. C.; Michael, T. P.; Priest, H. D.
  • Cold Spring Harbor Symposia on Quantitative Biology, Vol. 72, Issue 1
  • DOI: 10.1101/sqb.2007.72.006

A gene expression map of Arabidopsis thaliana development
journal, April 2005

  • Schmid, Markus; Davison, Timothy S.; Henz, Stefan R.
  • Nature Genetics, Vol. 37, Issue 5
  • DOI: 10.1038/ng1543

PAML 4: Phylogenetic Analysis by Maximum Likelihood
journal, April 2007


Preservation of Duplicate Genes by Complementary, Degenerative Mutations
journal, April 1999


AtPIN: Arabidopsis thaliana Protein Interaction Network
journal, December 2009

  • Brandão, Marcelo M.; Dantas, Luiza L.; Silva-Filho, Marcio C.
  • BMC Bioinformatics, Vol. 10, Issue 1
  • DOI: 10.1186/1471-2105-10-454

Can genes be truly redundant?
journal, October 1992