skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections

Abstract

This report contains the comprehensive summary of the work performed on the SBIR Phase II project (“Distributed Relevance Ranking in Heterogeneous Document Collections”) at Deep Web Technologies (http://www.deepwebtech.com). We have successfully completed all of the tasks defined in our SBIR Proposal work plan (See Table 1 - Phase II Tasks Status). The project was completed on schedule and we have successfully deployed an initial production release of the software architecture at DOE-OSTI for the Science.gov Alliance's search portal (http://www.science.gov). We have implemented a set of grid services that supports the extraction, filtering, aggregation, and presentation of search results from numerous heterogeneous document collections. Illustration 3 depicts the services required to perform QuickRank™ filtering of content as defined in our architecture documentation. Functionality that has been implemented is indicated by the services highlighted in green. We have successfully tested our implementation in a multi-node grid deployment both within the Deep Web Technologies offices, and in a heterogeneous geographically distributed grid environment. We have performed a series of load tests in which we successfully simulated 100 concurrent users submitting search requests to the system. This testing was performed on deployments of one, two, and three node grids with services distributed in amore » number of different configurations. The preliminary results from these tests indicate that our architecture will scale well across multi-node grid deployments, but more work will be needed, beyond the scope of this project, to perform testing and experimentation to determine scalability and resiliency requirements. We are pleased to report that a production quality version (1.4) of the science.gov Alliance's search portal based on our grid architecture was released in June of 2006. This demonstration portal is currently available at http://science.gov/search30 . The portal allows the user to select from a number of collections grouped by category and enter a query expression (See Illustration 1 - Science.gov 3.0 Search Page). After the user clicks “search” a results page is displayed that provides a list of results from the selected collections ordered by relevance based on the query expression the user provided. Our grid based solution to deep web search and document ranking has already gained attention within DOE, other Government Agencies and a fortune 50 company. We are committed to the continued development of grid based solutions to large scale data access, filtering, and presentation problems within the domain of Information Retrieval and the more general categories of content management, data mining and data analysis.« less

Authors:
Publication Date:
Research Org.:
Deep Web Technologies
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
896967
Report Number(s):
DOE/ER/83822-1
TRN: US200721%%874
DOE Contract Number:  
FG02-03ER83822
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ARCHITECTURE; DATA ANALYSIS; DOCUMENTATION; IMPLEMENTATION; INFORMATION RETRIEVAL; MANAGEMENT; MINING; PRODUCTION; SCHEDULES; TESTING; Distributed Relevance Ranking, Heterogeneous Document Collections, Grid Services, Extraction, Filtering, Aggregation, Information Retrieval

Citation Formats

Lederman, Abe. DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections. United States: N. p., 2007. Web. doi:10.2172/896967.
Lederman, Abe. DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections. United States. doi:10.2172/896967.
Lederman, Abe. Mon . "DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections". United States. doi:10.2172/896967. https://www.osti.gov/servlets/purl/896967.
@article{osti_896967,
title = {DOE SBIR Phase II Final Report: Distributed Relevance Ranking in Heterogeneous Document Collections},
author = {Lederman, Abe},
abstractNote = {This report contains the comprehensive summary of the work performed on the SBIR Phase II project (“Distributed Relevance Ranking in Heterogeneous Document Collections”) at Deep Web Technologies (http://www.deepwebtech.com). We have successfully completed all of the tasks defined in our SBIR Proposal work plan (See Table 1 - Phase II Tasks Status). The project was completed on schedule and we have successfully deployed an initial production release of the software architecture at DOE-OSTI for the Science.gov Alliance's search portal (http://www.science.gov). We have implemented a set of grid services that supports the extraction, filtering, aggregation, and presentation of search results from numerous heterogeneous document collections. Illustration 3 depicts the services required to perform QuickRank™ filtering of content as defined in our architecture documentation. Functionality that has been implemented is indicated by the services highlighted in green. We have successfully tested our implementation in a multi-node grid deployment both within the Deep Web Technologies offices, and in a heterogeneous geographically distributed grid environment. We have performed a series of load tests in which we successfully simulated 100 concurrent users submitting search requests to the system. This testing was performed on deployments of one, two, and three node grids with services distributed in a number of different configurations. The preliminary results from these tests indicate that our architecture will scale well across multi-node grid deployments, but more work will be needed, beyond the scope of this project, to perform testing and experimentation to determine scalability and resiliency requirements. We are pleased to report that a production quality version (1.4) of the science.gov Alliance's search portal based on our grid architecture was released in June of 2006. This demonstration portal is currently available at http://science.gov/search30 . The portal allows the user to select from a number of collections grouped by category and enter a query expression (See Illustration 1 - Science.gov 3.0 Search Page). After the user clicks “search” a results page is displayed that provides a list of results from the selected collections ordered by relevance based on the query expression the user provided. Our grid based solution to deep web search and document ranking has already gained attention within DOE, other Government Agencies and a fortune 50 company. We are committed to the continued development of grid based solutions to large scale data access, filtering, and presentation problems within the domain of Information Retrieval and the more general categories of content management, data mining and data analysis.},
doi = {10.2172/896967},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2007},
month = {1}
}