Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix
Patent
·
OSTI ID:1079419
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
- Research Organization:
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC04-94AL85000
- Assignee:
- Sandia Corporation (Albuquerque, NM)
- Patent Number(s):
- 8,290,961
- Application Number:
- 12/352,621
- OSTI ID:
- 1079419
- Country of Publication:
- United States
- Language:
- English
Similar Records
On matrices with low-rank-plus-shift structure: Partial SVD and latent semantic indexing
Method and system of filtering and recommending documents
Document Retrieval and Ranking using Similarity Graph Mean Hitting Times
Technical Report
·
1998
·
OSTI ID:663268
Method and system of filtering and recommending documents
Patent
·
2016
·
OSTI ID:1237854
Document Retrieval and Ranking using Similarity Graph Mean Hitting Times
Technical Report
·
2021
·
OSTI ID:1835671