Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

The computational complexity of alternative updating approaches for an SVD-encoded indexing scheme

Conference ·
OSTI ID:125464
;  [1];  [2]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. Information Science Research Group, Morristown, NJ (United States)

Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the truncated SVD to estimate the underlying latent semantic structure of word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the effects of word choice variability by representing terms and documents using (orthogonal) left and right singular vectors. Current methods for adding new documents to an LSI database (folding-in documents) can have deteriorating effects on the orthogonality of the vectors used to represent documents in high-dimensional subspaces. An alternative approach which updates the original truncated SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is presented. The cost of the numerical computations and available memory needed to update the SVD versus the potential inaccuracy of former updating methods presents an interesting tradeoff for LSI database management. The computational cost of recomputing the truncated SVD of perturbed term-document matrices, updating current truncated SVD`s of term-document matrices, and the folding-in of new documents into an existing LSI model is presented.

OSTI ID:
125464
Report Number(s):
CONF-950212--; CNN: Grant NSF-CDA-9115428; Grant NSF-ASC-92-03004
Country of Publication:
United States
Language:
English

Similar Records

On matrices with low-rank-plus-shift structure: Partial SVD and latent semantic indexing
Technical Report · Sat Aug 01 00:00:00 EDT 1998 · OSTI ID:663268

On updating problems in latent semantic indexing
Journal Article · Fri Oct 01 00:00:00 EDT 1999 · SIAM Journal on Scientific Computing · OSTI ID:20015659

On updating problems in latent semantic indexing
Technical Report · Fri Oct 31 23:00:00 EST 1997 · OSTI ID:650342