skip to main content

SciTech ConnectSciTech Connect

Title: Supervised Gamma Process Poisson Factorization

This thesis develops the supervised gamma process Poisson factorization (S- GPPF) framework, a novel supervised topic model for joint modeling of count matrices and document labels. S-GPPF is fully generative and nonparametric: document labels and count matrices are modeled under a uni ed probabilistic framework and the number of latent topics is controlled automatically via a gamma process prior. The framework provides for multi-class classification of documents using a generative max-margin classifier. Several recent data augmentation techniques are leveraged to provide for exact inference using a Gibbs sampling scheme. The first portion of this thesis reviews supervised topic modeling and several key mathematical devices used in the formulation of S-GPPF. The thesis then introduces the S-GPPF generative model and derives the conditional posterior distributions of the latent variables for posterior inference via Gibbs sampling. The S-GPPF is shown to exhibit state-of-the-art performance for joint topic modeling and document classification on a dataset of conference abstracts, beating out competing supervised topic models. The unique properties of S-GPPF along with its competitive performance make it a novel contribution to supervised topic modeling.
Authors:
 [1]
  1. Univ. of Texas, Austin, TX (United States)
Publication Date:
OSTI Identifier:
1182679
Report Number(s):
SAND2015--3996T
583918
DOE Contract Number:
AC04-94AL85000
Resource Type:
Thesis/Dissertation
Research Org:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org:
USDOE National Nuclear Security Administration (NNSA)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING