DOE Patents title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

Abstract

A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.

Inventors:
;
Issue Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1079419
Patent Number(s):
8290961
Application Number:
12/352,621
Assignee:
Sandia Corporation (Albuquerque, NM)
Patent Classifications (CPCs):
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Patent
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Chew, Peter A, and Bader, Brett W. Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix. United States: N. p., 2012. Web.
Chew, Peter A, & Bader, Brett W. Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix. United States.
Chew, Peter A, and Bader, Brett W. Tue . "Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix". United States. https://www.osti.gov/servlets/purl/1079419.
@article{osti_1079419,
title = {Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix},
author = {Chew, Peter A and Bader, Brett W},
abstractNote = {A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Oct 16 00:00:00 EDT 2012},
month = {Tue Oct 16 00:00:00 EDT 2012}
}

Works referenced in this record:

Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004
book, January 2005


A method for multiple attribute decision making with incomplete weight information under uncertain linguistic preference relations
conference, November 2007


Unsupervised Learning of the Morphology of a Natural Language
journal, June 2001


A Statistical Interpretation of term Specificity and its Application in Retrieval
journal, January 1972


A Mathematical Theory of Communication
journal, October 1948


Improving the retrieval of information from external sources
journal, June 1991


Formal grammar and information theory: together again?
journal, April 2000

  • Pereira, Fernando
  • Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 358, Issue 1769
  • https://doi.org/10.1098/rsta.2000.0583

Enhancing multilingual latent semantic analysis with term alignment information
conference, January 2008


Toward the Logical Description of Languages in Their Phonemic Aspect
journal, January 1953


A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices
journal, June 1964


Some mathematical notes on three-mode factor analysis
journal, September 1966


Document classification using nonnegative matrix factorization and underapproximation
conference, May 2009


Using Linear Algebra for Intelligent Information Retrieval
journal, December 1995


Three models for the description of language
journal, September 1956


Co-ranking Authors and Documents in a Heterogeneous Network
conference, October 2007


Discussion Tracking in Enron Email Using PARAFAC
book, January 2008


Statistical phrase-based translation
conference, January 2003

  • Koehn, Philipp; Och, Franz Josef; Marcu, Daniel
  • Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03
  • https://doi.org/10.3115/1073445.1073462

Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization
book, January 2005


Mapping the backbone of science
journal, August 2005


On certain formal properties of grammars
journal, June 1959