Enhancing multilingual latent semantic analysis with term alignment information.

Chew, Peter A; Bader, Brett William

Title: Enhancing multilingual latent semantic analysis with term alignment information.

Conference · Fri Aug 01 00:00:00 EDT 2008

OSTI ID:947265

Chew, Peter A; Bader, Brett William

Latent Semantic Analysis (LSA) is based on the Singular Value Decomposition (SVD) of a term-by-document matrix for identifying relationships among terms and documents from co-occurrence patterns. Among the multiple ways of computing the SVD of a rectangular matrix X, one approach is to compute the eigenvalue decomposition (EVD) of a square 2 x 2 composite matrix consisting of four blocks with X and XT in the off-diagonal blocks and zero matrices in the diagonal blocks. We point out that significant value can be added to LSA by filling in some of the values in the diagonal blocks (corresponding to explicit term-to-term or document-to-document associations) and computing a term-by-concept matrix from the EVD. For the case of multilingual LSA, we incorporate information on cross-language term alignments of the same sort used in Statistical Machine Translation (SMT). Since all elements of the proposed EVD-based approach can rely entirely on lexical statistics, hardly any price is paid for the improved empirical results. In particular, the approach, like LSA or SMT, can still be generalized to virtually any language(s); computation of the EVD takes similar resources to that of the SVD since all the blocks are sparse; and the results of EVD are just as economical as those of SVD.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 947265

Report Number(s):: SAND2008-5394C; TRN: US200909%%12

Resource Relation:: Conference: Proposed for presentation at the 22nd International Conference on Computational Linguistics held August 16-24, 2008 in Manchester, UK.

Country of Publication:: United States

Language:: English

Similar Records

Cross-language information retrieval using PARAFAC2.

Technical Report · Tue May 01 00:00:00 EDT 2007 · OSTI ID:947265

Bader, Brett William; Chew, Peter; Abdelali, Ahmed; +1 more

Latent morpho-semantic analysis : multilingual information retrieval with character n-grams and mutual information.

Conference · Fri Aug 01 00:00:00 EDT 2008 · OSTI ID:947265

Bader, Brett William; Chew, Peter A; Abdelali, Ahmed

Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

Journal Article · Thu Jan 01 00:00:00 EST 2009 · Journal of Undergraduate Research · OSTI ID:947265

Cavanagh, Joseph M; Cui, Xiaohui

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
INFORMATION RETRIEVAL
MACHINE TRANSLATIONS
STANDARDIZED TERMINOLOGY
MATRICES

Title: Enhancing multilingual latent semantic analysis with term alignment information.

Citation Formats

Similar Records

Related Subjects