Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, and model requirements are satis ed as a post-processing step.
- Research Organization:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1254278
- Report Number(s):
- SAND2016-4520; 640184
- Country of Publication:
- United States
- Language:
- English
Similar Records
An articulatorily constrained, maximum entropy approach to speech recognition and speech coding
Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs