Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Semantic role labeling for protein transport predicates

Journal Article · · BMC Bioinformatics
 [1];  [2];  [3];  [4]
  1. Univ. of Colorado, Boulder, CO (United States). Computer Science Dept.; DOE/OSTI
  2. National Library of Medicine (NLM), Bethesda, MD (United States). National Center for Biotechnology Information
  3. Univ. of Colorado, Boulder, CO (United States). Computer Science Dept.
  4. Univ. of Colorado, Aurora, CO (United States). School of Medicine. Center for Computational Pharmacology

Background: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role. Results: We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous wordchunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones. Conclusion: We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.

Research Organization:
Oak Ridge Associated Univ., Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC05-06OR23100
OSTI ID:
1626355
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 9; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (23)

Literature mining for the biologist: from information retrieval to biological discovery journal February 2006
Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors journal January 2004
MedPost: a part-of-speech tagger for bioMedical text journal April 2004
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text journal April 2005
The Biomolecular Interaction Network Database and related tools 2005 update journal December 2004
Unrestricted Coreference: Identifying Entities and Events in OntoNotes conference September 2007
Finding Generifs via gene Ontology Annotations conference December 2005
Benchmarking natural-language parsers for biological applications using dependency graphs journal January 2007
BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features journal September 2007
OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression journal January 2008
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature journal September 2004
Extraction of Transcript Diversity from Scientific Literature journal June 2005
Target word detection and semantic role chunking using support vector machines
  • Hacioglu, Kadri; Ward, Wayne
  • Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers - NAACL '03 https://doi.org/10.3115/1073483.1073492
conference January 2003
Joint learning improves semantic role labeling
  • Toutanova, Kristina; Haghighi, Aria; Manning, Christopher D.
  • Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL '05 https://doi.org/10.3115/1219840.1219913
conference January 2005
Knowtator
  • Ogren, Philip V.
  • Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume: demonstrations - https://doi.org/10.3115/1225785.1225791
conference January 2006
A lightweight semantic chunking model based on tagging conference January 2004
Support Vector Learning for Semantic Argument Classification journal June 2005
Database resources of the National Center for Biotechnology Information journal January 2006
Unrestricted Coreference: Identifying Entities and Events in OntoNotes conference September 2007
The Proposition Bank: An Annotated Corpus of Semantic Roles journal March 2005
MILANO – custom annotation of microarray results using automatic literature searches journal January 2005
Improving protein function prediction methods with integrated literature data journal April 2008
A semi-automatic method for annotating a biomedical proposition bank conference January 2006

Cited By (2)

A resource-saving collective approach to biomedical semantic role labeling journal May 2014
Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts journal July 2009