DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report

Abstract

A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the ‘‘functional similarity’’ between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the ‘‘ortholog conjecture’’ (or, more properly, the ‘‘ortholog functional conservation hypothesis’’). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs ofmore » orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an ‘‘open world assumption’’ (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.« less

Authors:
 [1];  [2];  [3];  [3];  [4]
  1. Univ. of California, Los Angeles, CA (United States). Dept. of Preventive Medicine. Division of Bioinformatics
  2. Univ. of Cambridge (United Kingdom). Dept. of Biochemistry. Cambridge Systems Biology Centre
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Genomics Division
  4. The Jackson Lab., Bar Harbor, ME (United States). Bioinformatics and Computational Biology
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division; National Institutes of Health (NIH)
OSTI Identifier:
1627221
Grant/Contract Number:  
AC02-05CH11231; P41 HG002273; R01 GM081084
Resource Type:
Accepted Manuscript
Journal Name:
PLoS Computational Biology (Online)
Additional Journal Information:
Journal Name: PLoS Computational Biology (Online); Journal Volume: 8; Journal Issue: 2; Journal ID: ISSN 1553-7358
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Biochemistry & Molecular Biology; Mathematical & Computational Biology

Citation Formats

Thomas, Paul D., Wood, Valerie, Mungall, Christopher J., Lewis, Suzanna E., and Blake, Judith A. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. United States: N. p., 2012. Web. doi:10.1371/journal.pcbi.1002386.
Thomas, Paul D., Wood, Valerie, Mungall, Christopher J., Lewis, Suzanna E., & Blake, Judith A. On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report. United States. https://doi.org/10.1371/journal.pcbi.1002386
Thomas, Paul D., Wood, Valerie, Mungall, Christopher J., Lewis, Suzanna E., and Blake, Judith A. Thu . "On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report". United States. https://doi.org/10.1371/journal.pcbi.1002386. https://www.osti.gov/servlets/purl/1627221.
@article{osti_1627221,
title = {On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report},
author = {Thomas, Paul D. and Wood, Valerie and Mungall, Christopher J. and Lewis, Suzanna E. and Blake, Judith A.},
abstractNote = {A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the ‘‘functional similarity’’ between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the ‘‘ortholog conjecture’’ (or, more properly, the ‘‘ortholog functional conservation hypothesis’’). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an ‘‘open world assumption’’ (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.},
doi = {10.1371/journal.pcbi.1002386},
journal = {PLoS Computational Biology (Online)},
number = 2,
volume = 8,
place = {United States},
year = {Thu Feb 16 00:00:00 EST 2012},
month = {Thu Feb 16 00:00:00 EST 2012}
}

Works referenced in this record:

The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics
journal, November 2010

  • Blake, J. A.; Bult, C. J.; Kadin, J. A.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1008

Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals
journal, June 2011


A novel DNA damage recognition protein in Schizosaccharomyces pombe
journal, April 2006


How confident can we be that orthologs are similar, but paralogs differ?
journal, May 2009


The 3′ Ends of Mature Transcripts Are Generated by a Processosome Complex in Fission Yeast Mitochondria
journal, April 2008

  • Hoffmann, Bastian; Nickel, Jens; Speer, Falk
  • Journal of Molecular Biology, Vol. 377, Issue 4
  • DOI: 10.1016/j.jmb.2008.01.038

Control of a Kinesin-Cargo Linkage Mechanism by JNK Pathway Kinases
journal, August 2007


Physiological and Molecular Basis of Thyroid Hormone Action
journal, July 2001


Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

The GOA database in 2009--an integrated Gene Ontology Annotation resource
journal, January 2009

  • Barrell, D.; Dimmer, E.; Huntley, R. P.
  • Nucleic Acids Research, Vol. 37, Issue Database
  • DOI: 10.1093/nar/gkn803

Gene Ontology annotations: what they mean and where they come from
journal, January 2008

  • Hill, David P.; Smith, Barry; McAndrews-Hill, Monica S.
  • BMC Bioinformatics, Vol. 9, Issue Suppl 5
  • DOI: 10.1186/1471-2105-9-S5-S2

The Gene Ontology in 2010: extensions and refinements
journal, January 2010

  • Consortium, The Gene Ontology
  • Nucleic Acids Research, Vol. 38, Issue suppl_1, p. D331-D335
  • DOI: 10.1093/nar/gkp1018

Protein Evolution by Molecular Tinkering: Diversification of the Nuclear Receptor Superfamily from a Ligand-Dependent Ancestor
journal, October 2010


Evolution of Hormone-Receptor Complexity by Molecular Exploitation
journal, April 2006


Distinguishing Homologous from Analogous Proteins
journal, June 1970

  • Fitch, Walter M.
  • Systematic Zoology, Vol. 19, Issue 2
  • DOI: 10.2307/2412448

When orthologs diverge between human and mouse
journal, June 2011

  • Gharib, W. H.; Robinson-Rechavi, M.
  • Briefings in Bioinformatics, Vol. 12, Issue 5
  • DOI: 10.1093/bib/bbr031

Motor Proteins: Trafficking and Signaling Collide
journal, September 2007


How confident can we be that orthologs are similar, but paralogs differ?
journal, May 2009


When orthologs diverge between human and mouse
journal, June 2011

  • Gharib, W. H.; Robinson-Rechavi, M.
  • Briefings in Bioinformatics, Vol. 12, Issue 5
  • DOI: 10.1093/bib/bbr031

A novel DNA damage recognition protein in Schizosaccharomyces pombe
journal, April 2006


The GOA database in 2009--an integrated Gene Ontology Annotation resource
journal, January 2009

  • Barrell, D.; Dimmer, E.; Huntley, R. P.
  • Nucleic Acids Research, Vol. 37, Issue Database
  • DOI: 10.1093/nar/gkn803

The Gene Ontology in 2010: extensions and refinements
journal, January 2010

  • Consortium, The Gene Ontology
  • Nucleic Acids Research, Vol. 38, Issue suppl_1, p. D331-D335
  • DOI: 10.1093/nar/gkp1018

The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics
journal, November 2010

  • Blake, J. A.; Bult, C. J.; Kadin, J. A.
  • Nucleic Acids Research, Vol. 39, Issue Database
  • DOI: 10.1093/nar/gkq1008

Physiological and Molecular Basis of Thyroid Hormone Action
journal, July 2001


Estrogen receptors and human disease
journal, March 2006

  • Deroo, B. J.
  • Journal of Clinical Investigation, Vol. 116, Issue 3
  • DOI: 10.1172/jci27987

Protein Evolution by Molecular Tinkering: Diversification of the Nuclear Receptor Superfamily from a Ligand-Dependent Ancestor
journal, October 2010


Works referencing / citing this record:

Genome-Wide Analysis of Protein Disorder in Arabidopsis thaliana: Implications for Plant Environmental Adaptation
journal, February 2013


Identifying mouse developmental essential genes using machine learning
journal, December 2018

  • Tian, David; Wenlock, Stephanie; Kabir, Mitra
  • Disease Models & Mechanisms, Vol. 11, Issue 12
  • DOI: 10.1242/dmm.034546

Standardized benchmarking in the quest for orthologs
journal, April 2016

  • Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador
  • Nature Methods, Vol. 13, Issue 5
  • DOI: 10.1038/nmeth.3830

ARTDeco: automatic readthrough transcription detection
journal, May 2020


Biological interpretation of genome-wide association studies using predicted gene functions
journal, January 2015

  • Pers, Tune H.; Karjalainen, Juha M.; Chan, Yingleong
  • Nature Communications, Vol. 6, Issue 1
  • DOI: 10.1038/ncomms6890

Functional and evolutionary implications of gene orthology
journal, April 2013

  • Gabaldón, Toni; Koonin, Eugene V.
  • Nature Reviews Genetics, Vol. 14, Issue 5
  • DOI: 10.1038/nrg3456

Protein Function Prediction: Problems and Pitfalls
journal, September 2015


An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework
journal, January 2016


Big data and other challenges in the quest for orthologs
journal, July 2014


Interspecies gene function prediction using semantic similarity
journal, December 2016


Semantic Similarity from Natural Language and Ontology Analysis
journal, May 2015


Semantic Similarity from Natural Language and Ontology Analysis
text, January 2017


A Tight Link between Orthologs and Bidirectional Best Hits in Bacterial and Archaeal Genomes
journal, November 2012

  • Wolf, Yuri I.; Koonin, Eugene V.
  • Genome Biology and Evolution, Vol. 4, Issue 12
  • DOI: 10.1093/gbe/evs100

Functional and structural profiles of GST gene family from three Populus species reveal the sequence–function decoupling of orthologous genes
journal, September 2018

  • Yang, Qi; Han, Xue‐Min; Gu, Jin‐Ke
  • New Phytologist, Vol. 221, Issue 2
  • DOI: 10.1111/nph.15430

Gene ontology improves template selection in comparative protein docking
journal, December 2018

  • Hadarovich, Anna; Anishchenko, Ivan; Tuzikov, Alexander V.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 87, Issue 3
  • DOI: 10.1002/prot.25645

Conserved syntenic clusters of protein coding genes are missing in birds
journal, December 2014


The Ortholog Conjecture Revisited: the Value of Orthologs and Paralogs in Function Prediction
journal, December 2019


Pairwise comparisons across species are problematic when analyzing functional genomic data
journal, January 2018

  • Dunn, Casey W.; Zapata, Felipe; Munro, Catriona
  • Proceedings of the National Academy of Sciences, Vol. 115, Issue 3
  • DOI: 10.1073/pnas.1707515115

Accurate prediction of orthologs in the presence of divergence after duplication
journal, June 2018


Accurate prediction of orthologs in the presence of divergence after duplication
journal, April 2018

  • Lafond, Manuel; Miardan, Mona Meghdari; Sankoff, David
  • Bioinformatics
  • DOI: 10.1101/294405

Human Monogenic Disease Genes Have Frequently Functionally Redundant Paralogs
journal, May 2013


The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction
journal, July 2020


OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis
journal, November 2012

  • Whiteside, Matthew D.; Winsor, Geoffrey L.; Laird, Matthew R.
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1241

OrthoList 2: A New Comparative Genomic Analysis of Human and Caenorhabditis elegans Genes
journal, August 2018


Standardized benchmarking in the quest for orthologs
text, January 2016


The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data
journal, November 2012


Pairwise comparisons across species are problematic when analyzing functional genomic data
journal, January 2018

  • Dunn, Casey W.; Zapata, Felipe; Munro, Catriona
  • Proceedings of the National Academy of Sciences, Vol. 115, Issue 3
  • DOI: 10.1073/pnas.1707515115

Accurate prediction of orthologs in the presence of divergence after duplication
journal, June 2018


An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework
journal, January 2016


A Tight Link between Orthologs and Bidirectional Best Hits in Bacterial and Archaeal Genomes
journal, November 2012

  • Wolf, Yuri I.; Koonin, Eugene V.
  • Genome Biology and Evolution, Vol. 4, Issue 12
  • DOI: 10.1093/gbe/evs100

Gene Family Level Comparative Analysis of Gene Expression in Mammals Validates the Ortholog Conjecture
journal, March 2014

  • Rogozin, Igor B.; Managadze, David; Shabalina, Svetlana A.
  • Genome Biology and Evolution, Vol. 6, Issue 4
  • DOI: 10.1093/gbe/evu051

OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis
journal, November 2012

  • Whiteside, Matthew D.; Winsor, Geoffrey L.; Laird, Matthew R.
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1241

Protein Function Prediction Using Deep Restricted Boltzmann Machines
journal, January 2017

  • Zou, Xianchun; Wang, Guijun; Yu, Guoxian
  • BioMed Research International, Vol. 2017
  • DOI: 10.1155/2017/1729301

The case of Iranian immigrants in the greater Toronto area: a qualitative study
journal, January 2012


Progress and challenges in the computational prediction of gene function using networks
journal, September 2012


The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data
journal, November 2012


Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships
journal, January 2013


Ten Quick Tips for Using the Gene Ontology
journal, November 2013


WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning
journal, November 2016


Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss?
journal, July 2014


In Silico Analysis and Experimental Validation of Active Compounds from Cichorium intybus L. Ameliorating Liver Injury
journal, September 2015

  • Li, Guo-Yu; Zheng, Ya-Xin; Sun, Fu-Zhou
  • International Journal of Molecular Sciences, Vol. 16, Issue 9
  • DOI: 10.3390/ijms160922190

Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs
text, January 2012


Phylogenetic Profiling : How Much Input Data Is Enough?
text, January 2015


Standardized benchmarking in the quest for orthologs
text, January 2016


Evaluating the adaptive evolutionary convergence of carnivorous plant taxa through functional genomics
journal, January 2018