skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Using DEDICOM for completely unsupervised part-of-speech tagging.

Abstract

A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.

Authors:
; ;  [1]
  1. (University of Illinois, Urbana, IL)
Publication Date:
Research Org.:
Sandia National Laboratories
Sponsoring Org.:
USDOE
OSTI Identifier:
978915
Report Number(s):
SAND2009-0842
TRN: US201011%%3
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; SPEECH; D CODES; IDENTIFICATION SYSTEMS; Cognitive science.; Speech-Mathematical models.; Speech processing systems.

Citation Formats

Chew, Peter A., Bader, Brett William, and Rozovskaya, Alla. Using DEDICOM for completely unsupervised part-of-speech tagging.. United States: N. p., 2009. Web. doi:10.2172/978915.
Chew, Peter A., Bader, Brett William, & Rozovskaya, Alla. Using DEDICOM for completely unsupervised part-of-speech tagging.. United States. doi:10.2172/978915.
Chew, Peter A., Bader, Brett William, and Rozovskaya, Alla. Sun . "Using DEDICOM for completely unsupervised part-of-speech tagging.". United States. doi:10.2172/978915. https://www.osti.gov/servlets/purl/978915.
@article{osti_978915,
title = {Using DEDICOM for completely unsupervised part-of-speech tagging.},
author = {Chew, Peter A. and Bader, Brett William and Rozovskaya, Alla},
abstractNote = {A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.},
doi = {10.2172/978915},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2009},
month = {2}
}

Technical Report:

Save / Share: