| | |
Summary: BIOINFORMATICS Vol. 19 Suppl. 1 2003, pages i340i349
DOI: 10.1093/bioinformatics/btg1047
Extracting synonymous gene and protein terms
from biological literature
Hong Yu and Eugene Agichtein
Department of Computer Science, Columbia University, New York, NY, USA
Received on January 6, 2003; accepted on February 20, 2003
ABSTRACT
Motivation: Genes and proteins are often associated with
multiple names. More names are added as new functional
or structural information is discovered. Because authors
can use any one of the known names for a gene or protein,
information retrieval and extraction would benefit from
identifying the gene and protein terms that are synonyms
of the same substance.
Results: We have explored four complementary ap-
proaches for extracting gene and protein synonyms from
text, namely the unsupervised, partially supervised, and
supervised machine-learning techniques, as well as the
manual knowledge-based approach. We report results
|