skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automatic Generation of Data Types for Classification of Deep Web Sources

Conference ·
DOI:https://doi.org/10.1007/11530084_21· OSTI ID:15016845

A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automatic generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.

Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15016845
Report Number(s):
UCRL-CONF-209719; TRN: US200516%%1162
Resource Relation:
Journal Volume: 3615; Conference: Presented at: 2nd International Workshop on Data Integration in the Life Sciences, San Diego, CA, United States, Jul 20 - Jul 22, 2005
Country of Publication:
United States
Language:
English

References (1)

Instance-based Schema Matching for Web Databases by Domain-specific Query Probing book January 2004

Similar Records

An Abstract Description Approach to the Discovery and Classification of Bioinformatics Web Sources
Conference · Thu May 01 00:00:00 EDT 2003 · OSTI ID:15016845

Review and comparison of web- and disk-based tools for residentialenergy analysis
Journal Article · Sun Aug 25 00:00:00 EDT 2002 · Energy and Buildings · OSTI ID:15016845

Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces
Journal Article · Mon Dec 22 00:00:00 EST 2003 · World Wide Web · OSTI ID:15016845