Summary: A Parallel Relational Database Management System
Approach to Relevance Feedback in Information Retrieval
, Ophir Frieder2
, David Grossman3
, and David O. Holmes4
Abstract. A scalable, parallel, relational-database driven information retrieval engine is described. To
support portability across a wide-range of execution environments, including parallel multicomputers, all
algorithms strictly adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy
was significantly enhanced over prior database-driven information retrieval efforts. Algorithmic
modifications to our earlier prototype resulted in significantly enhanced scalability. Currently our
information retrieval engine sustains near-linear speedups using a 24-node parallel database machine.
Experiments using the TREC data collections are presented to validate the described approaches.
1 . Introduction
The continued growth, acceptance, and public reliance on digital libraries has fostered
wide interest in information retrieval systems. Traditionally, customized approaches
developed to provide information retrieval services. Recently, however, general solutions that
support traditional information retrieval functionality and integration of structured data and
text have appeared both in the research community, e.g., (DeFazio95, Grossman94,
Grossman97) and in the commercial sector, e.g., Oracle's ConText (Oracle97).