The computational complexity of alternative updating approaches for an SVD-encoded indexing scheme
- Univ. of Tennessee, Knoxville, TN (United States)
- Information Science Research Group, Morristown, NJ (United States)
Latent Semantic Indexing (LSI) is a conceptual indexing technique which uses the truncated SVD to estimate the underlying latent semantic structure of word to document association. By computing a lower-rank approximation to the original term-document matrix, LSI dampens the effects of word choice variability by representing terms and documents using (orthogonal) left and right singular vectors. Current methods for adding new documents to an LSI database (folding-in documents) can have deteriorating effects on the orthogonality of the vectors used to represent documents in high-dimensional subspaces. An alternative approach which updates the original truncated SVD so as to preserve the orthogonality among document vectors corresponding to the new term-document matrix is presented. The cost of the numerical computations and available memory needed to update the SVD versus the potential inaccuracy of former updating methods presents an interesting tradeoff for LSI database management. The computational cost of recomputing the truncated SVD of perturbed term-document matrices, updating current truncated SVD`s of term-document matrices, and the folding-in of new documents into an existing LSI model is presented.
- OSTI ID:
- 125464
- Report Number(s):
- CONF-950212-; CNN: Grant NSF-CDA-9115428; Grant NSF-ASC-92-03004; TRN: 95:005768-0008
- Resource Relation:
- Conference: 7. Society for Industrial and Applied Mathematics (SIAM) conference on parallel processing for scientific computing, San Francisco, CA (United States), 15-17 Feb 1995; Other Information: PBD: 1995; Related Information: Is Part Of Proceedings of the seventh SIAM conference on parallel processing for scientific computing; Bailey, D.H.; Bjorstad, P.E.; Gilbert, J.R. [eds.] [and others]; PB: 894 p.
- Country of Publication:
- United States
- Language:
- English
Similar Records
On updating problems in latent semantic indexing
On updating problems in latent semantic indexing