skip to main content

SciTech ConnectSciTech Connect

Title: Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging

This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, andmore » model requirements are satis ed as a post-processing step.« less
 [1] ;  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Publication Date:
OSTI Identifier:
Report Number(s):
DOE Contract Number:
Resource Type:
Technical Report
Research Org:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States