Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

Patent ·
OSTI ID:1079419
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC04-94AL85000
Assignee:
Sandia Corporation (Albuquerque, NM)
Patent Number(s):
8,290,961
Application Number:
12/352,621
OSTI ID:
1079419
Country of Publication:
United States
Language:
English

References (22)

Three models for the description of language journal September 1956
Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004 book January 2005
Co-ranking Authors and Documents in a Heterogeneous Network conference October 2007
The approximation of one matrix by another of lower rank journal September 1936
A Mathematical Theory of Communication journal October 1948
Indexing by latent semantic analysis journal September 1990
Mapping the backbone of science journal August 2005
Some mathematical notes on three-mode factor analysis journal September 1966
A method for multiple attribute decision making with incomplete weight information under uncertain linguistic preference relations conference November 2007
A Statistical Interpretation of term Specificity and its Application in Retrieval journal January 1972
Using Linear Algebra for Intelligent Information Retrieval journal December 1995
Document classification using nonnegative matrix factorization and underapproximation conference May 2009
Discussion Tracking in Enron Email Using PARAFAC book January 2008
Formal grammar and information theory: together again?
  • Pereira, Fernando
  • Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 358, Issue 1769 https://doi.org/10.1098/rsta.2000.0583
journal April 2000
Statistical phrase-based translation
  • Koehn, Philipp; Och, Franz Josef; Marcu, Daniel
  • Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03 https://doi.org/10.3115/1073445.1073462
conference January 2003
On certain formal properties of grammars journal June 1959
Toward the Logical Description of Languages in Their Phonemic Aspect journal January 1953
Enhancing multilingual latent semantic analysis with term alignment information conference January 2008
Unsupervised Learning of the Morphology of a Natural Language journal June 2001
A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices journal June 1964
Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization book January 2005
Improving the retrieval of information from external sources journal June 1991