skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Focused Crawling of the Deep Web Using Service Class Descriptions

Abstract

Dynamic Web data sources--sometimes known collectively as the Deep Web--increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DynaBot has three unique characteristics. First, DynaBot utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular, self-tuning system architecture for focused crawling of the DeepWeb using service class descriptions. Third, DynaBot incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, andmore » matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.« less

Authors:
; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
15014275
Report Number(s):
UCRL-CONF-204866
TRN: US200805%%393
DOE Contract Number:  
W-7405-ENG-48
Resource Type:
Conference
Resource Relation:
Conference: Presented at: International Conference on Service Oriented Computing, New York, NY, United States, Nov 15 - Nov 18, 2004
Country of Publication:
United States
Language:
English
Subject:
99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; 59 BASIC BIOLOGICAL SCIENCES; ALGORITHMS; COMPUTER ARCHITECTURE; INTERNET

Citation Formats

Rocco, D, Liu, L, and Critchlow, T. Focused Crawling of the Deep Web Using Service Class Descriptions. United States: N. p., 2004. Web.
Rocco, D, Liu, L, & Critchlow, T. Focused Crawling of the Deep Web Using Service Class Descriptions. United States.
Rocco, D, Liu, L, and Critchlow, T. 2004. "Focused Crawling of the Deep Web Using Service Class Descriptions". United States. https://www.osti.gov/servlets/purl/15014275.
@article{osti_15014275,
title = {Focused Crawling of the Deep Web Using Service Class Descriptions},
author = {Rocco, D and Liu, L and Critchlow, T},
abstractNote = {Dynamic Web data sources--sometimes known collectively as the Deep Web--increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DynaBot has three unique characteristics. First, DynaBot utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular, self-tuning system architecture for focused crawling of the DeepWeb using service class descriptions. Third, DynaBot incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.},
doi = {},
url = {https://www.osti.gov/biblio/15014275}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jun 21 00:00:00 EDT 2004},
month = {Mon Jun 21 00:00:00 EDT 2004}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: