Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix
Abstract
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
- Inventors:
- Issue Date:
- Research Org.:
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1079419
- Patent Number(s):
- 8290961
- Application Number:
- 12/352,621
- Assignee:
- Sandia Corporation (Albuquerque, NM)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- AC04-94AL85000
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Chew, Peter A, and Bader, Brett W. Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix. United States: N. p., 2012.
Web.
Chew, Peter A, & Bader, Brett W. Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix. United States.
Chew, Peter A, and Bader, Brett W. Tue .
"Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix". United States. https://www.osti.gov/servlets/purl/1079419.
@article{osti_1079419,
title = {Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix},
author = {Chew, Peter A and Bader, Brett W},
abstractNote = {A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2012},
month = {10}
}
Works referenced in this record:
Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004
book, January 2005
- Tomlinson, Stephen
- Multilingual Information Access for Text, Speech and Images
A method for multiple attribute decision making with incomplete weight information under uncertain linguistic preference relations
conference, November 2007
- Xu, Yejun
- 2007 IEEE International Conference on Grey Systems and Intelligent Services
Unsupervised Learning of the Morphology of a Natural Language
journal, June 2001
- Goldsmith, John
- Computational Linguistics, Vol. 27, Issue 2
A Statistical Interpretation of term Specificity and its Application in Retrieval
journal, January 1972
- Sparck Jones, Karen
- Journal of Documentation, Vol. 28, Issue 1
A Mathematical Theory of Communication
journal, October 1948
- Shannon, C. E.
- Bell System Technical Journal, Vol. 27, Issue 4
Improving the retrieval of information from external sources
journal, June 1991
- Dumais, Susan T.
- Behavior Research Methods, Instruments, & Computers, Vol. 23, Issue 2
Formal grammar and information theory: together again?
journal, April 2000
- Pereira, Fernando
- Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, Vol. 358, Issue 1769
Enhancing multilingual latent semantic analysis with term alignment information
conference, January 2008
- Bader, Brett W.; Chew, Peter A.
- Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08
Toward the Logical Description of Languages in Their Phonemic Aspect
journal, January 1953
- Cherry, E. Colin; Halle, Morris; Jakobson, Roman
- Language, Vol. 29, Issue 1
A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices
journal, June 1964
- Sinkhorn, Richard
- The Annals of Mathematical Statistics, Vol. 35, Issue 2
Some mathematical notes on three-mode factor analysis
journal, September 1966
- Tucker, Ledyard R.
- Psychometrika, Vol. 31, Issue 3
Document classification using nonnegative matrix factorization and underapproximation
conference, May 2009
- Berry, Michael W.; Gillis, Nicolas; Glineur, Francois
- 2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009
Using Linear Algebra for Intelligent Information Retrieval
journal, December 1995
- Berry, Michael W.; Dumais, Susan T.; O’Brien, Gavin W.
- SIAM Review, Vol. 37, Issue 4
Three models for the description of language
journal, September 1956
- Chomsky, N.
- IEEE Transactions on Information Theory, Vol. 2, Issue 3
Co-ranking Authors and Documents in a Heterogeneous Network
conference, October 2007
- Zhou, Ding; Orshanskiy, Sergey A.; Zha, Hongyuan
- Seventh IEEE International Conference on Data Mining (ICDM 2007)
Discussion Tracking in Enron Email Using PARAFAC
book, January 2008
- Bader, Brett W.; Berry, Michael W.; Browne, Murray
- Survey of Text Mining II
Statistical phrase-based translation
conference, January 2003
- Koehn, Philipp; Och, Franz Josef; Marcu, Daniel
- Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03
Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization
book, January 2005
- Schneider, Karl-Michael
- Knowledge Discovery in Databases: PKDD 2005
Mapping the backbone of science
journal, August 2005
- Boyack, Kevin W.; Klavans, Richard; Börner, Katy
- Scientometrics, Vol. 64, Issue 3
On certain formal properties of grammars
journal, June 1959
- Chomsky, Noam
- Information and Control, Vol. 2, Issue 2