DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Measuring semantic similarities by combining gene ontology annotations and gene co-function networks

Abstract

Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but aremore » relevant in a taxon-specific manner become measurable when GO annotations are limited.« less

Authors:
 [1];  [2];  [3];  [4];  [3];  [5]
  1. Harbin Institute of Technology, Harbin (China); Michigan State Univ., East Lansing, MI (United States)
  2. Michigan State Univ., East Lansing, MI (United States)
  3. Carnegie Institution for Science, Stanford, CA (United States)
  4. Harbin Institute of Technology, Harbin (China)
  5. Michigan State University, East Lansing, MI (United States)
Publication Date:
Research Org.:
Michigan State Univ., East Lansing, MI (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1194164
Grant/Contract Number:  
FG02-91ER20021
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 16; Journal Issue: 1; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; co-function network; gene ontology; semantic similarity; gene function annotation

Citation Formats

Peng, Jiajie, Uygun, Sahra, Kim, Taehyong, Wang, Yadong, Rhee, Seung Y., and Chen, Jin. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks. United States: N. p., 2015. Web. doi:10.1186/s12859-015-0474-7.
Peng, Jiajie, Uygun, Sahra, Kim, Taehyong, Wang, Yadong, Rhee, Seung Y., & Chen, Jin. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks. United States. https://doi.org/10.1186/s12859-015-0474-7
Peng, Jiajie, Uygun, Sahra, Kim, Taehyong, Wang, Yadong, Rhee, Seung Y., and Chen, Jin. Sat . "Measuring semantic similarities by combining gene ontology annotations and gene co-function networks". United States. https://doi.org/10.1186/s12859-015-0474-7. https://www.osti.gov/servlets/purl/1194164.
@article{osti_1194164,
title = {Measuring semantic similarities by combining gene ontology annotations and gene co-function networks},
author = {Peng, Jiajie and Uygun, Sahra and Kim, Taehyong and Wang, Yadong and Rhee, Seung Y. and Chen, Jin},
abstractNote = {Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited.},
doi = {10.1186/s12859-015-0474-7},
journal = {BMC Bioinformatics},
number = 1,
volume = 16,
place = {United States},
year = {2015},
month = {2}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 39 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: An example of GO structure and annotation, gene co-function network, and the functional distance. (A) GO structure and annotation. ta…tj and “root” are GO terms, edges are the ‘is-a’ (solid line) or ‘part-of ’ (dashed line) relations between these terms, and {g1g13} in boxes are the sets ofmore » genes annotated to the corresponding terms. (B) An example of a co-function network. Each node and edge represents a gene and a functional association between the genes, respectively. The number at each edge represents a confidence score that measures the probability of an interaction to represent a true functional linkage between the genes. (C) An example of the functional distance between two gene sets. Ga (or Gb) is the set of genes annotated to ta (or tb) or its descendants. The number at each edge represents the functional distance between the genes where 0 = functional identity and 1 = no functional relationship.« less

Save / Share:

Works referenced in this record:

Mitochondrial dysfunction induces ALK5-SMAD2-mediated hypovascularization and arteriovenous malformations in mouse retinas
journal, December 2022


Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
journal, May 2007


MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research
journal, May 2005

  • Zhang, Peifen; Foerster, Hartmut; Tissier, Christophe P.
  • Plant Physiology, Vol. 138, Issue 1
  • DOI: 10.1104/pp.105.060376

An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae
journal, October 2007


Saccharomyces Genome Database: the genomics resource of budding yeast
journal, November 2011

  • Cherry, J. M.; Hong, E. L.; Amundsen, C.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1029

An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae
journal, October 2007


Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
journal, May 2007


The Gene Ontology Categorizer
journal, July 2004


A new measure for functional similarity of gene products based on Gene Ontology
journal, June 2006

  • Schlicker, Andreas; Domingues, Francisco S.; Rahnenführer, Jörg
  • BMC Bioinformatics, Vol. 7, Issue 1
  • DOI: 10.1186/1471-2105-7-302

Measuring gene functional similarity based on group-wise comparison of GO terms
journal, April 2013


The Gene Ontology Categorizer
journal, July 2004


Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

Cytochrome P450 and Chemical Toxicology
journal, January 2008

  • Guengerich, F. Peter
  • Chemical Research in Toxicology, Vol. 21, Issue 1
  • DOI: 10.1021/tx700079z

Comparing partitions
journal, December 1985

  • Hubert, Lawrence; Arabie, Phipps
  • Journal of Classification, Vol. 2, Issue 1
  • DOI: 10.1007/BF01908075

A new method to measure the semantic similarity of GO terms
journal, March 2007


A categorization approach to automated ontological function annotation
journal, June 2006

  • Verspoor, Karin; Cohn, Judith; Mniszewski, Susan
  • Protein Science, Vol. 15, Issue 6
  • DOI: 10.1110/ps.062184006

Using GOstats to test gene lists for GO term association
journal, November 2006


Arabidopsis Transcription Factors: Genome-Wide Comparative Analysis Among Eukaryotes
journal, December 2000


Towards revealing the functions of all genes in plants
journal, April 2014


Defining genetic interaction
journal, February 2008

  • Mani, R.; St. Onge, R. P.; Hartman, J. L.
  • Proceedings of the National Academy of Sciences, Vol. 105, Issue 9
  • DOI: 10.1073/pnas.0712255105

The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools
journal, December 2011

  • Lamesch, Philippe; Berardini, Tanya Z.; Li, Donghui
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1090

Diverse Transcriptional Programs Associated with Environmental Stress and Hormones in the Arabidopsis Receptor-Like Kinase Gene Family
journal, January 2009

  • Chae, Lee; Sudat, Sylvia; Dudoit, Sandrine
  • Molecular Plant, Vol. 2, Issue 1
  • DOI: 10.1093/mp/ssn083

Diverse Transcriptional Programs Associated with Environmental Stress and Hormones in the Arabidopsis Receptor-Like Kinase Gene Family
journal, January 2009

  • Chae, Lee; Sudat, Sylvia; Dudoit, Sandrine
  • Molecular Plant, Vol. 2, Issue 1
  • DOI: 10.1093/mp/ssn083

The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools
journal, December 2011

  • Lamesch, Philippe; Berardini, Tanya Z.; Li, Donghui
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1090

An integrated approach to characterize genetic interaction networks in yeast metabolism
journal, May 2011

  • Szappanos, Balázs; Kovács, Károly; Szamecz, Béla
  • Nature Genetics, Vol. 43, Issue 7
  • DOI: 10.1038/ng.846

Prioritizing candidate disease genes by network-based boosting of genome-wide association data
journal, May 2011


Enhanced automated function prediction using distantly related sequences and contextual association by PFP
journal, June 2006

  • Hawkins, Troy; Luban, Stanislav; Kihara, Daisuke
  • Protein Science, Vol. 15, Issue 6
  • DOI: 10.1110/ps.062153506

Microarray data analysis: from disarray to consolidation and consensus
journal, January 2006

  • Allison, David B.; Cui, Xiangqin; Page, Grier P.
  • Nature Reviews Genetics, Vol. 7, Issue 1
  • DOI: 10.1038/nrg1749

An integrated approach to characterize genetic interaction networks in yeast metabolism
journal, May 2011

  • Szappanos, Balázs; Kovács, Károly; Szamecz, Béla
  • Nature Genetics, Vol. 43, Issue 7
  • DOI: 10.1038/ng.846

Microarray data analysis: from disarray to consolidation and consensus
journal, January 2006

  • Allison, David B.; Cui, Xiangqin; Page, Grier P.
  • Nature Reviews Genetics, Vol. 7, Issue 1
  • DOI: 10.1038/nrg1749

Predicting gene function through systematic analysis and quality assessment of high-throughput data
journal, November 2004


PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors
journal, October 2013

  • Jin, Jinpu; Zhang, He; Kong, Lei
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1016

QuickGO: a web-based tool for Gene Ontology searching
journal, September 2009


Enhanced automated function prediction using distantly related sequences and contextual association by PFP
journal, June 2006

  • Hawkins, Troy; Luban, Stanislav; Kihara, Daisuke
  • Protein Science, Vol. 15, Issue 6
  • DOI: 10.1110/ps.062153506

Use and misuse of the gene ontology annotations
journal, May 2008

  • Yon Rhee, Seung; Wood, Valerie; Dolinski, Kara
  • Nature Reviews Genetics, Vol. 9, Issue 7
  • DOI: 10.1038/nrg2363

Arabidopsis Transcription Factors: Genome-Wide Comparative Analysis Among Eukaryotes
journal, December 2000


Defining genetic interaction
journal, February 2008

  • Mani, R.; St. Onge, R. P.; Hartman, J. L.
  • Proceedings of the National Academy of Sciences, Vol. 105, Issue 9
  • DOI: 10.1073/pnas.0712255105

Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana
journal, January 2010

  • Lee, Insuk; Ambaru, Bindu; Thakkar, Pranjali
  • Nature Biotechnology, Vol. 28, Issue 2
  • DOI: 10.1038/nbt.1603

STRING v9.1: protein-protein interaction networks, with increased coverage and integration
journal, November 2012

  • Franceschini, Andrea; Szklarczyk, Damian; Frankild, Sune
  • Nucleic Acids Research, Vol. 41, Issue D1
  • DOI: 10.1093/nar/gks1094

Evaluation of high-throughput functional categorization of human disease genes
journal, January 2007


Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network
journal, January 2003

  • Brun, Christine; Chevenet, François; Martin, David
  • Genome Biology, Vol. 5, Issue 1, p. R6
  • DOI: 10.1186/gb-2003-5-1-r6

PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors
journal, October 2013

  • Jin, Jinpu; Zhang, He; Kong, Lei
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1016

The MIPS mammalian protein-protein interaction database
journal, November 2004


Basic local alignment search tool
journal, October 1990

  • Altschul, Stephen F.; Gish, Warren; Miller, Webb
  • Journal of Molecular Biology, Vol. 215, Issue 3, p. 403-410
  • DOI: 10.1016/S0022-2836(05)80360-2

Prioritizing candidate disease genes by network-based boosting of genome-wide association data
journal, May 2011


Cytochrome P450 and Chemical Toxicology
journal, January 2008

  • Guengerich, F. Peter
  • Chemical Research in Toxicology, Vol. 21, Issue 1
  • DOI: 10.1021/tx700079z

Semantic Similarity in Biomedical Ontologies
journal, July 2009


Dietary palmitic acid promotes a prometastatic memory via Schwann cells
journal, November 2021


A novel network pharmacology approach for leukaemia differentiation therapy using Mogrify®
journal, October 2022


Semantic Similarity in Biomedical Ontologies
journal, July 2009


Diversification of P450 Genes During Land Plant Evolution
journal, June 2010


The MIPS mammalian protein-protein interaction database
journal, November 2004


Evaluation of high-throughput functional categorization of human disease genes
text, January 2007

  • Chen, James L.; Liu, Yang; Sam, Lee T.
  • Columbia University
  • DOI: 10.7916/d8g44nr0

Towards revealing the functions of all genes in plants
journal, April 2014


Diversification of P450 Genes During Land Plant Evolution
journal, June 2010


The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
journal, November 2013

  • Caspi, Ron; Altman, Tomer; Billington, Richard
  • Nucleic Acids Research, Vol. 42, Issue D1
  • DOI: 10.1093/nar/gkt1103

QuickGO: a web-based tool for Gene Ontology searching
journal, September 2009


Saccharomyces Genome Database: the genomics resource of budding yeast
journal, November 2011

  • Cherry, J. M.; Hong, E. L.; Amundsen, C.
  • Nucleic Acids Research, Vol. 40, Issue D1
  • DOI: 10.1093/nar/gkr1029

Use and misuse of the gene ontology annotations
journal, May 2008

  • Yon Rhee, Seung; Wood, Valerie; Dolinski, Kara
  • Nature Reviews Genetics, Vol. 9, Issue 7
  • DOI: 10.1038/nrg2363

A new method to measure the semantic similarity of GO terms
journal, March 2007


Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana
journal, January 2010

  • Lee, Insuk; Ambaru, Bindu; Thakkar, Pranjali
  • Nature Biotechnology, Vol. 28, Issue 2
  • DOI: 10.1038/nbt.1603

Classification
journal, June 1999


Works referencing / citing this record:

An online tool for measuring and visualizing phenotype similarities using HPO
journal, August 2018


Erratum to: InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology
journal, March 2017


OAHG: an integrated resource for annotating human genes with multi-level ontologies
journal, October 2016

  • Cheng, Liang; Sun, Jie; Xu, Wanying
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep34820

Constructing an integrated gene similarity network for the identification of disease genes
conference, December 2016

  • Zhen Tian, ; Guo, Maozu
  • 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • DOI: 10.1109/bibm.2016.7822768

Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach
journal, March 2018


Constructing Networks of Organelle Functional Modules in Arabidopsis
journal, August 2016


Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework
journal, March 2017

  • Weichenberger, Christian X.; Palermo, Antonia; Pramstaller, Peter P.
  • Scientific Reports, Vol. 7, Issue 1
  • DOI: 10.1038/s41598-017-00465-5

An online tool for measuring and visualizing phenotype similarities using HPO
journal, August 2018


Predicting disease-related genes using integrated biomedical networks
journal, January 2017


Constructing an integrated gene similarity network for the identification of disease genes
conference, December 2016

  • Zhen Tian, ; Guo, Maozu
  • 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • DOI: 10.1109/bibm.2016.7822768

Investigations on factors influencing HPO-based semantic similarity calculation
journal, September 2017


Measuring disease similarity and predicting disease-related ncRNAs by a novel method
journal, December 2017


Constructing an integrated gene similarity network for the identification of disease genes
journal, September 2017


Predicting disease-related genes using integrated biomedical networks
journal, January 2017


OAHG: an integrated resource for annotating human genes with multi-level ontologies
journal, October 2016

  • Cheng, Liang; Sun, Jie; Xu, Wanying
  • Scientific Reports, Vol. 6, Issue 1
  • DOI: 10.1038/srep34820

InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology
journal, August 2016


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.