Latent morpho-semantic analysis : multilingual information retrieval with character n-grams and mutual information.

Bader, Brett William; Chew, Peter A; Abdelali, Ahmed

Latent morpho-semantic analysis : multilingual information retrieval with character n-grams and mutual information.

Conference · Fri Aug 01 00:00:00 EDT 2008

OSTI ID:947254

Bader, Brett William; Chew, Peter A; Abdelali, Ahmed ^[1]

New Mexico State University

We describe an entirely statistics-based, unsupervised, and language-independent approach to multilingual information retrieval, which we call Latent Morpho-Semantic Analysis (LMSA). LMSA overcomes some of the shortcomings of related previous approaches such as Latent Semantic Analysis (LSA). LMSA has an important theoretical advantage over LSA: it combines well-known techniques in a novel way to break the terms of LSA down into units which correspond more closely to morphemes. Thus, it has a particular appeal for use with morphologically complex languages such as Arabic. We show through empirical results that the theoretical advantages of LMSA can translate into significant gains in precision in multilingual information retrieval tests. These gains are not matched either when a standard stemmer is used with LSA, or when terms are indiscriminately broken down into n-grams.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Sandia National Laboratories

Sponsoring Organization:: USDOE

DOE Contract Number:: AC04-94AL85000

OSTI ID:: 947254

Report Number(s):: SAND2008-5395C

Country of Publication:: United States

Language:: English

Similar Records

Enhancing multilingual latent semantic analysis with term alignment information.

Conference · Fri Aug 01 00:00:00 EDT 2008 · OSTI ID:947265

Cross-language information retrieval using PARAFAC2.

Technical Report · Tue May 01 00:00:00 EDT 2007 · OSTI ID:908061

Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-Grams and Mutual Information.

Conference · Thu Jan 31 23:00:00 EST 2008 · OSTI ID:1146133

Related Subjects

99 GENERAL AND MISCELLANEOUS
ACCURACY
INFORMATION RETRIEVAL
MACHINE TRANSLATIONS
STANDARDIZED TERMINOLOGY

Latent morpho-semantic analysis : multilingual information retrieval with character n-grams and mutual information.

Citation Formats

Similar Records

Related Subjects