DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections
This report contains the comprehensive summary of the work performed on the SBIR Phase II project (“Distributed Relevance Ranking in Heterogeneous Document Collections”) at Deep Web Technologies (http://www.deepwebtech.com). We have successfully completed all of the tasks defined in our SBIR Proposal work plan (See Table 1 - Phase II Tasks Status). The project was completed on schedule and we have successfully deployed an initial production release of the software architecture at DOE-OSTI for the Science.gov Alliance's search portal (http://www.science.gov). We have implemented a set of grid services that supports the extraction, filtering, aggregation, and presentation of search results from numerous heterogeneous document collections. Illustration 3 depicts the services required to perform QuickRank™ filtering of content as defined in our architecture documentation. Functionality that has been implemented is indicated by the services highlighted in green. We have successfully tested our implementation in a multi-node grid deployment both within the Deep Web Technologies offices, and in a heterogeneous geographically distributed grid environment. We have performed a series of load tests in which we successfully simulated 100 concurrent users submitting search requests to the system. This testing was performed on deployments of one, two, and three node grids with services distributed in a number of different configurations. The preliminary results from these tests indicate that our architecture will scale well across multi-node grid deployments, but more work will be needed, beyond the scope of this project, to perform testing and experimentation to determine scalability and resiliency requirements. We are pleased to report that a production quality version (1.4) of the science.gov Alliance's search portal based on our grid architecture was released in June of 2006. This demonstration portal is currently available at http://science.gov/search30 . The portal allows the user to select from a number of collections grouped by category and enter a query expression (See Illustration 1 - Science.gov 3.0 Search Page). After the user clicks “search” a results page is displayed that provides a list of results from the selected collections ordered by relevance based on the query expression the user provided. Our grid based solution to deep web search and document ranking has already gained attention within DOE, other Government Agencies and a fortune 50 company. We are committed to the continued development of grid based solutions to large scale data access, filtering, and presentation problems within the domain of Information Retrieval and the more general categories of content management, data mining and data analysis.
- Research Organization:
- Deep Web Technologies
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- FG02-03ER83822
- OSTI ID:
- 896967
- Report Number(s):
- DOE/ER/83822-1; TRN: US200721%%874
- Country of Publication:
- United States
- Language:
- English
Similar Records
Smart Grid Information Clearinghouse (SGIC)
Best practices mobile application for D and D KM-IT - 15712
Related Subjects
ARCHITECTURE
DATA ANALYSIS
DOCUMENTATION
IMPLEMENTATION
INFORMATION RETRIEVAL
MANAGEMENT
MINING
PRODUCTION
SCHEDULES
TESTING
Distributed Relevance Ranking
Heterogeneous Document Collections
Grid Services
Extraction
Filtering
Aggregation
Information Retrieval