skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Enhancing multilingual latent semantic analysis with term alignment information.

Abstract

Latent Semantic Analysis (LSA) is based on the Singular Value Decomposition (SVD) of a term-by-document matrix for identifying relationships among terms and documents from co-occurrence patterns. Among the multiple ways of computing the SVD of a rectangular matrix X, one approach is to compute the eigenvalue decomposition (EVD) of a square 2 x 2 composite matrix consisting of four blocks with X and XT in the off-diagonal blocks and zero matrices in the diagonal blocks. We point out that significant value can be added to LSA by filling in some of the values in the diagonal blocks (corresponding to explicit term-to-term or document-to-document associations) and computing a term-by-concept matrix from the EVD. For the case of multilingual LSA, we incorporate information on cross-language term alignments of the same sort used in Statistical Machine Translation (SMT). Since all elements of the proposed EVD-based approach can rely entirely on lexical statistics, hardly any price is paid for the improved empirical results. In particular, the approach, like LSA or SMT, can still be generalized to virtually any language(s); computation of the EVD takes similar resources to that of the SVD since all the blocks are sparse; and the results of EVD are justmore » as economical as those of SVD.« less

Authors:
;
Publication Date:
Research Org.:
Sandia National Laboratories
Sponsoring Org.:
USDOE
OSTI Identifier:
947265
Report Number(s):
SAND2008-5394C
TRN: US200909%%12
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Conference
Resource Relation:
Conference: Proposed for presentation at the 22nd International Conference on Computational Linguistics held August 16-24, 2008 in Manchester, UK.
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; INFORMATION RETRIEVAL; MACHINE TRANSLATIONS; STANDARDIZED TERMINOLOGY; MATRICES

Citation Formats

Chew, Peter A, and Bader, Brett William. Enhancing multilingual latent semantic analysis with term alignment information.. United States: N. p., 2008. Web.
Chew, Peter A, & Bader, Brett William. Enhancing multilingual latent semantic analysis with term alignment information.. United States.
Chew, Peter A, and Bader, Brett William. Fri . "Enhancing multilingual latent semantic analysis with term alignment information.". United States.
@article{osti_947265,
title = {Enhancing multilingual latent semantic analysis with term alignment information.},
author = {Chew, Peter A and Bader, Brett William},
abstractNote = {Latent Semantic Analysis (LSA) is based on the Singular Value Decomposition (SVD) of a term-by-document matrix for identifying relationships among terms and documents from co-occurrence patterns. Among the multiple ways of computing the SVD of a rectangular matrix X, one approach is to compute the eigenvalue decomposition (EVD) of a square 2 x 2 composite matrix consisting of four blocks with X and XT in the off-diagonal blocks and zero matrices in the diagonal blocks. We point out that significant value can be added to LSA by filling in some of the values in the diagonal blocks (corresponding to explicit term-to-term or document-to-document associations) and computing a term-by-concept matrix from the EVD. For the case of multilingual LSA, we incorporate information on cross-language term alignments of the same sort used in Statistical Machine Translation (SMT). Since all elements of the proposed EVD-based approach can rely entirely on lexical statistics, hardly any price is paid for the improved empirical results. In particular, the approach, like LSA or SMT, can still be generalized to virtually any language(s); computation of the EVD takes similar resources to that of the SVD since all the blocks are sparse; and the results of EVD are just as economical as those of SVD.},
doi = {},
url = {https://www.osti.gov/biblio/947265}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2008},
month = {8}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.