DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein annotation as term categorization in the gene ontology using word proximity networks

Journal Article · · BMC Bioinformatics
 [1];  [1];  [1];  [1];  [1];  [2];  [3]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. Indiana Univ., Bloomington, IN (United States). School of Informatics, Cognitive Science Program
  3. Indiana Univ., Bloomington, IN (United States). Cognitive Science Program

Background: We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO. Results: The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results. Conclusion: The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC52-06NA25396
OSTI ID:
1626313
Journal Information:
BMC Bioinformatics, Vol. 6, Issue Suppl 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (10)

Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal November 2021
Mapping gene ontology to proteins based on protein–protein interaction data journal April 2004
Poset Ontologies and Concept Lattices as Semantic Hierarchies book January 2004
The Gene Ontology Categorizer journal July 2004
Gene Ontology: tool for the unification of biology journal May 2000
The Gene Ontology Categorizer journal July 2004
Fuzzy Graphs and Fuzzy Hypergraphs book January 2000
Poset Ontologies and Concept Lattices as Semantic Hierarchies book January 2004
Mapping gene ontology to proteins based on protein–protein interaction data journal April 2004
Ordered Sets: An Introduction book January 2003

Cited By (12)

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters journal February 2014
Roles for Text Mining in Protein Function Prediction book January 2014
Multi-label literature classification based on the Gene Ontology graph journal December 2008
Overview of BioCreAtIvE: critical assessment of information extraction for biology journal January 2005
Evaluation of BioCreAtIvE assessment of task 2 journal May 2005
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks text January 2008
Distance closures on complex networks journal March 2015
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks journal July 2007
Distance Closures on Complex Networks preprint January 2013
Assessing the Impact of Case Sensitivity and Term Information Gain on Biomedical Concept Recognition journal March 2015
Gene Function Prediction Based on the Gene Ontology Hierarchical Structure journal September 2014
Novel metrics for evaluating the functional coherence of protein groups via protein semantic network journal January 2007