DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns

Abstract

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1]; ORCiD logo [3]; ORCiD logo [4]; ORCiD logo [5]; ORCiD logo [6]; ORCiD logo [1]; ORCiD logo [7]; ORCiD logo [7]; ORCiD logo [3]; ORCiD logo [7]; ORCiD logo [1]; ORCiD logo [2]
  1. Univ. of Cambridge (United Kingdom)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Univ. College London (United Kingdom)
  4. Stanford Univ., CA (United States)
  5. The Jackson Lab., Bar Harbor, ME (United States)
  6. California Institute of Technology (CalTech), Pasadena, CA (United States)
  7. Swiss Inst. of Bioformatics, Geneva (Switzerland)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Biological and Environmental Research (BER); Wellcome Trust; National Human Genome Research Institute (NHGRI); UK Medical Research Council; UK Biotechnology and Biological Sciences Research Council; Alzheimer’s Research UK
OSTI Identifier:
1713252
Grant/Contract Number:  
AC02-05CH11231; 104967/Z/14/Z; U41 HG002273; U41 HG001315; U24 HG010859; U24 HG002223; MR/S000453/1; BB/P024602/1; MR/N030117/1; ARUK-NAS2017A-1
Resource Type:
Accepted Manuscript
Journal Name:
Open Biology
Additional Journal Information:
Journal Volume: 10; Journal Issue: 9; Journal ID: ISSN 2046-2441
Publisher:
The Royal Society
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; gene ontology; quality control; annotation; biocuration

Citation Formats

Wood, Valerie, Carbon, Seth, Harris, Midori A., Lock, Antonia, Engel, Stacia R., Hill, David P., Van Auken, Kimberly, Attrill, Helen, Feuermann, Marc, Gaudet, Pascale, Lovering, Ruth C., Poux, Sylvain, Rutherford, Kim M., and Mungall, Christopher J. Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. United States: N. p., 2020. Web. doi:10.1098/rsob.200149.
Wood, Valerie, Carbon, Seth, Harris, Midori A., Lock, Antonia, Engel, Stacia R., Hill, David P., Van Auken, Kimberly, Attrill, Helen, Feuermann, Marc, Gaudet, Pascale, Lovering, Ruth C., Poux, Sylvain, Rutherford, Kim M., & Mungall, Christopher J. Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. United States. https://doi.org/10.1098/rsob.200149
Wood, Valerie, Carbon, Seth, Harris, Midori A., Lock, Antonia, Engel, Stacia R., Hill, David P., Van Auken, Kimberly, Attrill, Helen, Feuermann, Marc, Gaudet, Pascale, Lovering, Ruth C., Poux, Sylvain, Rutherford, Kim M., and Mungall, Christopher J. Wed . "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns". United States. https://doi.org/10.1098/rsob.200149. https://www.osti.gov/servlets/purl/1713252.
@article{osti_1713252,
title = {Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns},
author = {Wood, Valerie and Carbon, Seth and Harris, Midori A. and Lock, Antonia and Engel, Stacia R. and Hill, David P. and Van Auken, Kimberly and Attrill, Helen and Feuermann, Marc and Gaudet, Pascale and Lovering, Ruth C. and Poux, Sylvain and Rutherford, Kim M. and Mungall, Christopher J.},
abstractNote = {Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.},
doi = {10.1098/rsob.200149},
journal = {Open Biology},
number = 9,
volume = 10,
place = {United States},
year = {Wed Sep 02 00:00:00 EDT 2020},
month = {Wed Sep 02 00:00:00 EDT 2020}
}

Works referenced in this record:

The InterPro protein families database: the classification resource after 15 years
journal, November 2014

  • Mitchell, Alex; Chang, Hsin-Yu; Daugherty, Louise
  • Nucleic Acids Research, Vol. 43, Issue D1
  • DOI: 10.1093/nar/gku1243

Quality of Computationally Inferred Gene Ontology Annotations
journal, May 2012


UniProt: the universal protein knowledgebase
journal, November 2016


Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology
journal, February 2018

  • Lovering, Ruth C.; Roncaglia, Paola; Howe, Douglas G.
  • Circulation: Genomic and Precision Medicine, Vol. 11, Issue 2
  • DOI: 10.1161/CIRCGEN.117.001813

AmiGO: online access to ontology and annotation data
journal, November 2008


UniProt: a worldwide hub of protein knowledge
November 2018


Proteogenomics of the human hippocampus: The road ahead
journal, July 2015

  • Kang, Myoung-Goo; Byun, Kyunghee; Kim, Jae Ho
  • Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics, Vol. 1854, Issue 7
  • DOI: 10.1016/j.bbapap.2015.02.010

Ensembl comparative genomics resources
journal, January 2016


The Impact of Focused Gene Ontology Curation of Specific Mammalian Systems
journal, December 2011


Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium
journal, August 2011

  • Gaudet, P.; Livstone, M. S.; Lewis, S. E.
  • Briefings in Bioinformatics, Vol. 12, Issue 5
  • DOI: 10.1093/bib/bbr042

Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?
journal, February 2019

  • Wood, Valerie; Lock, Antonia; Harris, Midori A.
  • Open Biology, Vol. 9, Issue 2
  • DOI: 10.1098/rsob.180241

Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing
journal, February 2019


Overlapping roles for the histone acetyltransferase activities of SAGA and Elongator in vivo
journal, June 2000


Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation
journal, January 2012


Integral membrane proteins Brr6 and Apq12 link assembly of the nuclear pore complex to lipid homeostasis in the endoplasmic reticulum
journal, December 2009

  • Hodge, C. A.; Choudhary, V.; Wolyniak, M. J.
  • Journal of Cell Science, Vol. 123, Issue 1
  • DOI: 10.1242/jcs.055046

A genome-wide resource of cell cycle and cell shape genes of fission yeast
journal, May 2013

  • Hayles, Jacqueline; Wood, Valerie; Jeffery, Linda
  • Open Biology, Vol. 3, Issue 5
  • DOI: 10.1098/rsob.130053

Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems
journal, September 2019


Gene Ontology: tool for the unification of biology
journal, May 2000

  • Ashburner, Michael; Ball, Catherine A.; Blake, Judith A.
  • Nature Genetics, Vol. 25, Issue 1
  • DOI: 10.1038/75556

Dead simple OWL design patterns
journal, June 2017

  • Osumi-Sutherland, David; Courtot, Melanie; Balhoff, James P.
  • Journal of Biomedical Semantics, Vol. 8, Issue 1
  • DOI: 10.1186/s13326-017-0126-0

PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information
journal, October 2018

  • Lock, Antonia; Rutherford, Kim; Harris, Midori A.
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky961

Translational Control of Cell Division by Elongator
journal, May 2012


The Gene Ontology Resource: 20 years and still GOing strong
journal, November 2018

  • Gene Ontology Consortium,
  • Nucleic Acids Research, Vol. 47, Issue D1, p. D330-D338
  • DOI: 10.1093/nar/gky1055

ECO, the Evidence & Conclusion Ontology: community standard for evidence information
journal, November 2018

  • Giglio, Michelle; Tauber, Rebecca; Nadendla, Suvarna
  • Nucleic Acids Research, Vol. 47, Issue D1
  • DOI: 10.1093/nar/gky1036

Gene Ontology Curation of Neuroinflammation Biology Improves the Interpretation of Alzheimer’s Disease Gene Expression Data
journal, June 2020

  • Kramarz, Barbara; Huntley, Rachael P.; Rodríguez-López, Milagros
  • Journal of Alzheimer's Disease, Vol. 75, Issue 4
  • DOI: 10.3233/JAD-200207

A method for increasing expressivity of Gene Ontology annotations using a compositional approach
journal, January 2014

  • Huntley, Rachael P.; Harris, Midori A.; Alam-Faruque, Yasmin
  • BMC Bioinformatics, Vol. 15, Issue 1
  • DOI: 10.1186/1471-2105-15-155

Transcriptomic analyses of murine resolution-phase macrophages
journal, December 2011


A guide to best practices for Gene Ontology (GO) manual annotation
journal, January 2013


Destabilization of the replication fork protection complex disrupts meiotic chromosome segregation
journal, November 2017

  • Escorcia, Wilber; Forsburg, Susan L.
  • Molecular Biology of the Cell, Vol. 28, Issue 22
  • DOI: 10.1091/mbc.e17-02-0101

Destabilization of the replication fork protection complex disrupts meiotic chromosome segregation
journal, November 2017

  • Escorcia, Wilber; Forsburg, Susan L.
  • Molecular Biology of the Cell, Vol. 28, Issue 22
  • DOI: 10.1091/mbc.e17-02-0101

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium
journal, August 2011

  • Gaudet, P.; Livstone, M. S.; Lewis, S. E.
  • Briefings in Bioinformatics, Vol. 12, Issue 5
  • DOI: 10.1093/bib/bbr042

Transcriptomic analyses of murine resolution-phase macrophages
journal, December 2011


Quality of Computationally Inferred Gene Ontology Annotations
journal, May 2012


The Gene Ontology Resource: 20 years and still GOing strong.
journalarticle, January 2019